bioFAM / MOFA2

Multi-Omics Factor Analysis
https://biofam.github.io/MOFA2/
GNU Lesser General Public License v3.0
300 stars 52 forks source link

Problem with load_model in R environment #74

Closed vinayduggi closed 3 years ago

vinayduggi commented 3 years ago

Hi,

I have trained and saved the model using the python notebooks. I tried to load the saved model into R environment but facing this error. Can you please help me with this. Please let me know if i should upload any more data along with this.

Warning message in load_model("/***/***/projects/mofapy2/mofapy2/models/COAD_cptac_gbm.hdf5", :
“There are duplicated features names across different views. We will add the suffix *_view* only for those features 
            Example: if you have both TP53 in mRNA and mutation data it will be renamed to TP53_mRNA, TP53_mutation”

ERROR: Error in .quality_control(object, verbose = verbose): !duplicated(unlist(features_names(object))) are not all TRUE

Error in .quality_control(object, verbose = verbose): !duplicated(unlist(features_names(object))) are not all TRUE
Traceback:

1. load_model("/****/****/projects/mofapy2/mofapy2/models/COAD_cptac_gbm.hdf5", 
 .     remove_inactive_factors = TRUE)
2. .quality_control(object, verbose = verbose)
3. stopifnot(!duplicated(unlist(features_names(object))))

the model should have appended the suffix values if it had found any duplicate features across views.

vinayduggi commented 3 years ago

I also tried to fix this issue by beforehand joining the feature column with view column so that while loading the model after training into R environment it doesn't need to again specifically add suffix names. This should have potentially resolved the issue. But again faced the same error. Attaching the dataframe before model training using which i tried to fix the above issue.

0   C3L-00104   COAD    A1BG_CNV    -0.628254   CNV
1   C3L-00365   COAD    A1BG_CNV    0.304944    CNV
2   C3L-00674   COAD    A1BG_CNV    0.247958    CNV
3   C3L-00677   COAD    A1BG_CNV    0.271067    CNV
4   C3L-01040   COAD    A1BG_CNV    0.351638    CNV
...     ...     ...     ...     ...     ...
252583  C3L-03728   COAD    yR211F11.2_ENSG00000213076.3_transcriptomics    1884.303750     transcriptomics
252584  C3L-03744   COAD    yR211F11.2_ENSG00000213076.3_transcriptomics    23137.436372    transcriptomics
252585  C3L-03748   COAD    yR211F11.2_ENSG00000213076.3_transcriptomics    6814.774431     transcriptomics
252586  C3L-03968   COAD    yR211F11.2_ENSG00000213076.3_transcriptomics    17126.451805    transcriptomics
252587  C3L-04084   COAD    yR211F11.2_ENSG00000213076.3_transcriptomics    4295.846346     transcriptomics
vinayduggi commented 3 years ago

The MOFA+ model training also went smoothly. Warning: some view(s) have less than 15 features, MOFA won't be able to learn meaningful factors for these view(s)...

######################################
## Training the model with seed 1 ##
######################################

ELBO before training: -59708145.59 

Iteration 1: time=2.61, ELBO=-7324248.07, deltaELBO=52383897.525 (87.73325148%), Factors=9
Iteration 2: time=2.28, ELBO=1092829.34, deltaELBO=8417077.407 (14.09703370%), Factors=9
Iteration 3: time=2.27, ELBO=2880597.57, deltaELBO=1787768.226 (2.99417811%), Factors=9
Iteration 4: time=2.28, ELBO=3031276.82, deltaELBO=150679.254 (0.25235963%), Factors=9
Iteration 5: time=2.28, ELBO=3119178.35, deltaELBO=87901.526 (0.14721865%), Factors=9
Iteration 6: time=2.20, ELBO=3180074.42, deltaELBO=60896.068 (0.10198955%), Factors=9
Iteration 7: time=2.19, ELBO=3225162.05, deltaELBO=45087.635 (0.07551337%), Factors=9
Iteration 8: time=2.22, ELBO=3258604.68, deltaELBO=33442.630 (0.05601016%), Factors=9
Iteration 9: time=2.22, ELBO=3283035.41, deltaELBO=24430.732 (0.04091692%), Factors=9
Iteration 10: time=2.19, ELBO=3300812.79, deltaELBO=17777.381 (0.02977380%), Factors=9
Iteration 11: time=2.19, ELBO=3314128.72, deltaELBO=13315.922 (0.02230168%), Factors=9
Iteration 12: time=2.18, ELBO=3324688.34, deltaELBO=10559.620 (0.01768539%), Factors=9
Iteration 13: time=2.20, ELBO=3333347.98, deltaELBO=8659.644 (0.01450329%), Factors=9
Iteration 14: time=2.21, ELBO=3340384.39, deltaELBO=7036.409 (0.01178467%), Factors=9
Iteration 15: time=2.25, ELBO=3346081.83, deltaELBO=5697.444 (0.00954215%), Factors=9
Iteration 16: time=2.18, ELBO=3350732.47, deltaELBO=4650.632 (0.00778894%), Factors=9
Iteration 17: time=2.22, ELBO=3354544.42, deltaELBO=3811.957 (0.00638432%), Factors=9
Iteration 18: time=2.27, ELBO=3357705.70, deltaELBO=3161.276 (0.00529455%), Factors=9
Iteration 19: time=2.27, ELBO=3360392.59, deltaELBO=2686.893 (0.00450004%), Factors=9
Iteration 20: time=2.28, ELBO=3362742.06, deltaELBO=2349.464 (0.00393491%), Factors=9
Iteration 21: time=2.22, ELBO=3364842.99, deltaELBO=2100.930 (0.00351867%), Factors=9
Iteration 22: time=2.24, ELBO=3366751.02, deltaELBO=1908.030 (0.00319559%), Factors=9
Iteration 23: time=2.28, ELBO=3368504.12, deltaELBO=1753.101 (0.00293612%), Factors=9
Iteration 24: time=2.24, ELBO=3370130.35, deltaELBO=1626.237 (0.00272364%), Factors=9
Iteration 25: time=2.25, ELBO=3371651.73, deltaELBO=1521.375 (0.00254802%), Factors=9
Iteration 26: time=2.29, ELBO=3373086.02, deltaELBO=1434.287 (0.00240216%), Factors=9
Iteration 27: time=2.29, ELBO=3374447.79, deltaELBO=1361.775 (0.00228072%), Factors=9
Iteration 28: time=2.29, ELBO=3375748.94, deltaELBO=1301.151 (0.00217919%), Factors=9
Iteration 29: time=2.24, ELBO=3376999.55, deltaELBO=1250.606 (0.00209453%), Factors=9
Iteration 30: time=2.31, ELBO=3378208.64, deltaELBO=1209.094 (0.00202501%), Factors=9
Iteration 31: time=2.28, ELBO=3379383.94, deltaELBO=1175.297 (0.00196840%), Factors=9
Iteration 32: time=2.27, ELBO=3380531.46, deltaELBO=1147.525 (0.00192189%), Factors=9
Iteration 33: time=2.27, ELBO=3381656.51, deltaELBO=1125.044 (0.00188424%), Factors=9
Iteration 34: time=2.28, ELBO=3382764.09, deltaELBO=1107.585 (0.00185500%), Factors=9
Iteration 35: time=2.27, ELBO=3383860.43, deltaELBO=1096.335 (0.00183616%), Factors=9
Iteration 36: time=2.26, ELBO=3384951.91, deltaELBO=1091.483 (0.00182803%), Factors=9
Iteration 37: time=2.24, ELBO=3386028.14, deltaELBO=1076.232 (0.00180249%), Factors=9
Iteration 38: time=2.26, ELBO=3387072.26, deltaELBO=1044.120 (0.00174871%), Factors=9
Iteration 39: time=2.28, ELBO=3388070.44, deltaELBO=998.176 (0.00167176%), Factors=9
Iteration 40: time=2.29, ELBO=3389028.11, deltaELBO=957.672 (0.00160392%), Factors=9
Iteration 41: time=2.27, ELBO=3389965.85, deltaELBO=937.741 (0.00157054%), Factors=9
Iteration 42: time=2.28, ELBO=3390898.08, deltaELBO=932.225 (0.00156130%), Factors=9
Iteration 43: time=2.28, ELBO=3391830.44, deltaELBO=932.359 (0.00156153%), Factors=9
Iteration 44: time=2.28, ELBO=3392764.80, deltaELBO=934.364 (0.00156488%), Factors=9
Iteration 45: time=2.28, ELBO=3393701.93, deltaELBO=937.128 (0.00156951%), Factors=9
Iteration 46: time=2.29, ELBO=3394642.38, deltaELBO=940.452 (0.00157508%), Factors=9
Iteration 47: time=2.22, ELBO=3395586.92, deltaELBO=944.539 (0.00158193%), Factors=9
Iteration 48: time=2.24, ELBO=3396536.38, deltaELBO=949.464 (0.00159017%), Factors=9
Iteration 49: time=2.22, ELBO=3397491.70, deltaELBO=955.319 (0.00159998%), Factors=9
Iteration 50: time=2.18, ELBO=4301925.13, deltaELBO=904433.425 (1.51475718%), Factors=9
Iteration 51: time=2.26, ELBO=4357187.19, deltaELBO=55262.063 (0.09255364%), Factors=9
Iteration 52: time=2.21, ELBO=4365975.64, deltaELBO=8788.445 (0.01471901%), Factors=9
Iteration 53: time=2.22, ELBO=4369320.62, deltaELBO=3344.985 (0.00560222%), Factors=9
Iteration 54: time=2.12, ELBO=4371111.54, deltaELBO=1790.919 (0.00299945%), Factors=9
Iteration 55: time=2.21, ELBO=4372245.67, deltaELBO=1134.126 (0.00189945%), Factors=9
Iteration 56: time=2.25, ELBO=4373049.05, deltaELBO=803.380 (0.00134551%), Factors=9
Iteration 57: time=2.26, ELBO=4373672.39, deltaELBO=623.345 (0.00104399%), Factors=9
Iteration 58: time=2.26, ELBO=4374193.65, deltaELBO=521.257 (0.00087301%), Factors=9
Iteration 59: time=2.25, ELBO=4374654.35, deltaELBO=460.706 (0.00077160%), Factors=9
Iteration 60: time=2.26, ELBO=4375078.78, deltaELBO=424.422 (0.00071083%), Factors=9
Iteration 61: time=2.21, ELBO=4375480.15, deltaELBO=401.375 (0.00067223%), Factors=9
Iteration 62: time=2.20, ELBO=4375867.77, deltaELBO=387.616 (0.00064918%), Factors=9
Iteration 63: time=2.19, ELBO=4376243.15, deltaELBO=375.388 (0.00062870%), Factors=9
Iteration 64: time=2.20, ELBO=4376612.10, deltaELBO=368.948 (0.00061792%), Factors=9
Iteration 65: time=2.24, ELBO=4376978.71, deltaELBO=366.611 (0.00061400%), Factors=9
Iteration 66: time=2.21, ELBO=4377345.44, deltaELBO=366.724 (0.00061419%), Factors=9
Iteration 67: time=2.17, ELBO=4377716.19, deltaELBO=370.752 (0.00062094%), Factors=9
Iteration 68: time=2.20, ELBO=4378096.97, deltaELBO=380.779 (0.00063773%), Factors=9
Iteration 69: time=2.25, ELBO=4378495.32, deltaELBO=398.352 (0.00066716%), Factors=9
Iteration 70: time=2.25, ELBO=4378920.04, deltaELBO=424.718 (0.00071132%), Factors=9
Iteration 71: time=2.22, ELBO=4379383.57, deltaELBO=463.529 (0.00077632%), Factors=9
Iteration 72: time=2.23, ELBO=4379910.23, deltaELBO=526.661 (0.00088206%), Factors=9
Iteration 73: time=2.26, ELBO=4380538.12, deltaELBO=627.892 (0.00105160%), Factors=9
Iteration 74: time=2.25, ELBO=4381328.49, deltaELBO=790.369 (0.00132372%), Factors=9
Iteration 75: time=2.31, ELBO=4382392.45, deltaELBO=1063.966 (0.00178194%), Factors=9
Iteration 76: time=2.27, ELBO=4383918.79, deltaELBO=1526.338 (0.00255633%), Factors=9
Iteration 77: time=2.24, ELBO=4386179.86, deltaELBO=2261.066 (0.00378686%), Factors=9
Iteration 78: time=2.24, ELBO=4389309.52, deltaELBO=3129.657 (0.00524159%), Factors=9
Iteration 79: time=2.21, ELBO=4392565.87, deltaELBO=3256.351 (0.00545378%), Factors=9
Iteration 80: time=2.16, ELBO=4394563.82, deltaELBO=1997.958 (0.00334621%), Factors=9
Iteration 81: time=2.21, ELBO=4395454.18, deltaELBO=890.357 (0.00149118%), Factors=9
Iteration 82: time=2.20, ELBO=4395904.98, deltaELBO=450.801 (0.00075501%), Factors=9
Iteration 83: time=2.20, ELBO=4396234.21, deltaELBO=329.230 (0.00055140%), Factors=9
Iteration 84: time=2.16, ELBO=4396536.22, deltaELBO=302.009 (0.00050581%), Factors=9
Iteration 85: time=2.19, ELBO=4396818.96, deltaELBO=282.738 (0.00047353%), Factors=9
Iteration 86: time=2.24, ELBO=4397085.03, deltaELBO=266.074 (0.00044562%), Factors=9

Converged!

#######################
## Training finished ##
#######################

Saving model in /*****/****/projects/mofapy2/mofapy2/models/COAD_cptac_gbm.hdf5...
gtca commented 3 years ago

Hey @vinayduggi, thanks for opening the issue, not sure I can reproduce it from the available info. Are there, by any chance, duplicated feature names in a view?

feature_names <- h5read('COAD_cptac_gbm.hdf5', 'features')
lapply(feature_names, function(e) e[duplicated(e)])
vinayduggi commented 3 years ago

Hey @gtca .....thanks for the prompt response...here's what i got from the above command......should i remove the below entries from the respective view??


$CNV
$SNV
$acetylproteomics

        'CREBBP_K1583K1586K1587K1588_EESTAASETTEGSQGDSK#NAK''GAPDH_K271_QASEGPLK#GILGYTEHQVVSSDFN$SDTHSSTFDAGAG''SMARCC2_K373_DSESAPVK#GGTMTDLDEQEDESM*ETTGKDEDENST''SMARCC2_K373_DSESAPVK#GGTMTDLDEQEDESMETTGKDEDEN$ST'

$cirRNA
$lipidomics
$metabolomics
$miRNA
$phosphoproteomics

        'ACIN1_S169S181_EAAELEEASAES*EDEMIHPEGVAS*LLPPDFQSS''AGAP2_S94S105_QDALWISTSSAGTGGAEPPALS*PAPASPARPVS*P''AKAP12_S154_SAVVHDITDDGQEETPEIIEQIPSSES*NLEELTQPTE''AKAP12_S742S743_ETGTDGILAGSQEHDPGQGSS*S*PEQAGSPTEG''AKAP12_S742S743T751_ETGTDGILAGSQEHDPGQGSS*S*PEQAGS''AKAP12_S743_ETGTDGILAGSQEHDPGQGSSS*PEQAGSPTEGEGVST''AKAP12_S743T751_ETGTDGILAGSQEHDPGQGSSS*PEQAGSPT*EG''AKAP12_S749_ETGTDGILAGSQEHDPGQGSSSPEQAGS*PTEGEGVST''AKAP12_S749T751_ETGTDGILAGSQEHDPGQGSSSPEQAGS*PT*EG''AKAP12_T135_SAVVHDIT*DDGQEETPEIIEQIPSSESNLEELTQPTE''AKAP12_T160_SAVVHDITDDGQEETPEIIEQIPSSESNLEELT*QPTE''AKAP12_T751_ETGTDGILAGSQEHDPGQGSSSPEQAGSPT*EGEGVST''ANK2_S3817S3818S3823_LYLQTPTS*S*ERGGS*PIIQEPEEPSEH''ANK2_T3814T3816S3823_LYLQT*PT*SSERGGS*PIIQEPEEPSEH''ANK2_T3816S3818S3823_LYLQTPT*SS*ERGGS*PIIQEPEEPSEH''AP3D1_S759S764S788_HSS*LPTES*DEDIAPAQQVDIVTEEMPENA''AP3D1_S764S788_HSSLPTES*DEDIAPAQQVDIVTEEMPENALPS*D''AP3D1_T903_KAEDLDFWLSTTPPPAPAPAPAPVPSTGELSVNTVTT*P''ARAP1_S207_EEESLLPSLSSPPQPQSEEPLSTLPQGPPQPPS*PPPCP''ARAP1_T197_EEESLLPSLSSPPQPQSEEPLST*LPQGPPQPPSPPPCP''ARHGAP12_S338_GDFQNPGDQELLSSEENYYSTSYSQSDSQCGS*PPR''ARHGAP17_S625_NNSQIASGQNQPQAAAGSHQLSMGQPHNAAGPS*PH''ARHGAP39_S488_HSQPPTPLPQAQEDAMSWSSQQDTLSSTGYS*PGTR''ARHGAP44_S604_GS*PGSSQGTACAGTQPGAQPGAQPGASPSPSQPPA''ARHGEF2_S941S947S960_LQDSS*DPDTGS*EEEGSSRLSPPHS*PR''ARHGEF2_S941S952S960_LQDSS*DPDTGSEEEGS*SRLSPPHS*PR''ARHGEF2_S941T945S947_LQDSS*DPDT*GS*EEEGSSRLSPPHSPR''ARHGEF26_S172_TPNAPAPCTPEEDLTGLTASPVPS*PTANGLAANND''ATCAY_S39_EEWQDEDLPRPLPEETGVELLGS*PVEDTSSPPNTLNFNG''ATCAY_S39S46_EEWQDEDLPRPLPEETGVELLGS*PVEDTSS*PPNTL''ATCAY_S45S46_EEWQDEDLPRPLPEETGVELLGSPVEDTS*S*PPNTL''ATCAY_S46_EEWQDEDLPRPLPEETGVELLGSPVEDTSS*PPNTLNFNG''ATCAY_T32S39_EEWQDEDLPRPLPEET*GVELLGS*PVEDTSSPPNTL''ATCAY_T50_EEWQDEDLPRPLPEETGVELLGSPVEDTSSPPNT*LNFNG''ATXN2L_S32_RPPGGTS*PPNGGLPGPLATSAAPPGPPAAASPCLGPVA''BCLAF1_S272S274S276_S*GS*GS*VGNGSSRYSPSQNSPIHHIPSR''BCLAF1_S281S282Y284_SGSGSVGNGS*S*RY*SPSQNSPIHHIPSR''BCLAF1_S282Y284S285_SGSGSVGNGSS*RY*S*PSQNSPIHHIPSR''BCLAF1_Y284S285S287_SGSGSVGNGSSRY*S*PS*QNSPIHHIPSR''CEP170_S928S930S933_TDEGPDTPSYNRDNS*IS*PES*DVDTAST''CEP170_T920S922Y923_TDEGPDT*PS*Y*NRDNSISPESDVDTAST''CEP170_Y923S928_TDEGPDTPSY*NRDNS*ISPESDVDTASTISLVT''CEP170_Y923S928S933_TDEGPDTPSY*NRDNS*ISPES*DVDTAST''CHD4_S103S105S108_QLGDSSGEGPEFVEEEEEVALRS*DS*EGS*D''CHD4_S105S108Y110_QLGDSSGEGPEFVEEEEEVALRSDS*EGS*DY''DENND4C_S1608S1610_GSASFFLKPSTSGDSLQS*GS*IPLANESLE''DLGAP4_Y761_NLSY*GDNSDPALEASSLPPPDPWLETSSSSPAEPAQP''DNAJC6_S510S513_SFCEEDHAALVNQES*EQS*DDELLTLSSPHGNA''DNAJC6_S510S522_SFCEEDHAALVNQES*EQSDDELLTLSS*PHGNA''DNAJC6_S510T519_SFCEEDHAALVNQES*EQSDDELLT*LSSPHGNA''DNAJC6_S513S522_SFCEEDHAALVNQESEQS*DDELLTLSS*PHGNA''DNAJC6_S513T519_SFCEEDHAALVNQESEQS*DDELLT*LSSPHGNA''DNAJC6_S521S522_SFCEEDHAALVNQESEQSDDELLTLS*S*PHGNA''DNM2_S764_EALNIIGDISTSTVSTPVPPPVDDTWLQSASSHS*PTPQR''EDC4_S555_FQPQLNPDVVAPLPTHTAHEDFTFGESRPELGS*EGLGSA''EDC4_T537_FQPQLNPDVVAPLPT*HTAHEDFTFGESRPELGSEGLGSA''EED_S34_LSSDENSNPDLS*GDENDDAVSIESGTNTERPDTPTNTPNAP''EED_S34T55_LSSDENSNPDLS*GDENDDAVSIESGTNTERPDT*PTNT''EED_T50T55_LSSDENSNPDLSGDENDDAVSIESGTNT*ERPDT*PTNT''EEF1B2_S106_YGPADVEDTTGSGATDSKDDDDIDLFGS*DDEEESEEA''EIF4EBP1_S44T45_VVLGDGVQLPPGDYSTTPGGTLFS*T*TPGGTRI''EIF4EBP1_T36T37_VVLGDGVQLPPGDYST*T*PGGTLFSTTPGGTRI''EIF4EBP1_T36T45_RVVLGDGVQLPPGDYST*TPGGTLFST*TPGGTR''EIF4EBP1_T37T41_VVLGDGVQLPPGDYSTT*PGGT*LFSTTPGGTRI''EIF4EBP1_T37T46_VVLGDGVQLPPGDYSTT*PGGTLFSTT*PGGTRI''EIF4EBP1_T45T46_VVLGDGVQLPPGDYSTTPGGTLFST*T*PGGTRI''EIF4EBP1_T46Y54_VVLGDGVQLPPGDYSTTPGGTLFSTT*PGGTRII''EIF4EBP2_T36T41_TVAISDAAQLPHDYCT*TPGGT*LFSTTPGGTRI''EIF4EBP2_T36T45_TVAISDAAQLPHDYCT*TPGGTLFST*TPGGTRI''EIF4EBP2_T36T46_TVAISDAAQLPHDYCT*TPGGTLFSTT*PGGTRI''EIF4EBP2_T45T46_TVAISDAAQLPHDYCTTPGGTLFST*T*PGGTRI''EIF5_T178_ENGSVSSSET*PPPPPPPNEINPPPHTMEEEEDDDWGEDT''EP400_S961S962_LYEGAFLPS*S*QWPRPKPDGEDTSGEEDADDCPG''EP400_T974S975_LYEGAFLPSSQWPRPKPDGEDT*S*GEEDADDCPG''EPB41L1_S430S437_S*LDGAEFS*RPASVSENHDAGPDGDKRDEDGE''EPB41L1_S430S441S443_S*LDGAEFSRPAS*VS*ENHDAGPDGDKR''EPB41L1_S430S441S443_S*LDGAEFSRPAS*VS*ENHDAGPDGDKR''EPB41L1_S437_SLDGAEFS*RPASVSENHDAGPDGDKRDEDGESGGQR''EPB41L1_S437S441S443_SLDGAEFS*RPAS*VS*ENHDAGPDGDKR''EPB41L1_S437S443_SLDGAEFS*RPASVS*ENHDAGPDGDKRDEDGE''EPB41L1_S441_SLDGAEFSRPAS*VSENHDAGPDGDKRDEDGESGGQR''EPB41L1_S441S443_SLDGAEFSRPAS*VS*ENHDAGPDGDKRDEDGE''EPB41L1_S443_SLDGAEFSRPASVS*ENHDAGPDGDKRDEDGESGGQR''EPB41L1_S443S461_SLDGAEFSRPASVS*ENHDAGPDGDKRDEDGES''EPB41L1_S461_SLDGAEFSRPASVSENHDAGPDGDKRDEDGES*GGQR''EPB41L1_S461S466_SLDGAEFSRPASVSENHDAGPDGDKRDEDGES*''EPB41L5_S420S436_SALPVSPS*ISSAPVPVEIENLPQS*PGTDQHD''EPB41L5_S423S436_SALPVSPSISS*APVPVEIENLPQS*PGTDQHD''FAM129A_S577S579_HNLFEDNMALPS*ES*VSSLTDLKPPTGSNQAS''FAM129A_S577S579S596_HNLFEDNMALPS*ES*VSSLTDLKPPTGS''FAM129A_S577S579T584_HNLFEDNMALPS*ES*VSSLT*DLKPPTG''FAM129A_S582S592_HNLFEDNMALPSESVSS*LTDLKPPTGS*NQAS''FAM129A_S582T584_HNLFEDNMALPSESVSS*LT*DLKPPTGSNQAS''FAM129A_S582T584S596_HNLFEDNMALPSESVSS*LT*DLKPPTGS''FAM129A_S592S596_HNLFEDNMALPSESVSSLTDLKPPTGS*NQAS*''FAM129A_T584S592S596_HNLFEDNMALPSESVSSLT*DLKPPTGS*''FAM129A_T584S596_HNLFEDNMALPSESVSSLT*DLKPPTGSNQAS*''FAM129A_T584T590_HNLFEDNMALPSESVSSLT*DLKPPT*GSNQAS''FAM129A_T584T590S596_HNLFEDNMALPSESVSSLT*DLKPPT*GS''FAM129A_T590S592_HNLFEDNMALPSESVSSLTDLKPPT*GS*NQAS''FAM129A_T590S596_HNLFEDNMALPSESVSSLTDLKPPT*GSNQAS*''FAM171A2_S789T791S792_SSASELRRDS*LT*S*PEDELGAEVGDE''FARP1_S403_SLASQPTELNSEVLEQSQQSTSLTFGEGAES*PGGQSCR''FMR1_S497S500_RGPGYTSGTNSEAS*NAS*ETESDHRDELSDWSLAP''FMR1_S500_RGPGYTSGTNSEASNAS*ETESDHRDELSDWSLAPTEEER''FMR1_S500S511_RGPGYTSGTNSEASNAS*ETESDHRDELS*DWSLAP''FMR1_S511_RGPGYTSGTNSEASNASETESDHRDELS*DWSLAPTEEER''GIGYF2_S357S359_EPIPEEQEMDFRPVDEGEECS*DS*EGSHNEEAK''GIGYF2_S357S359S362_EPIPEEQEMDFRPVDEGEECS*DS*EGS*H''GIGYF2_S359S362_EPIPEEQEMDFRPVDEGEECSDS*EGS*HNEEAK''HDGF_S229T248_NSTPSEPGS*GRGPPQEEEEEEDEEEEAT*KEDAEA''HDGFL3_S121S122_FTGYQAIQQQSSSETEGEGGNTADAS*S*EEEGD''HNRNPC_S253S260_MES*EGGADDS*AEEGDLLDDDDNEDRGDDQLEL''HSP90B1_T774_VEEEPEEEPEETAEDTTEDT*EQDEDEEMDVGTDEEE''HSPH1_S556S559_NVQQDNSEAGTQPQVQTDAQQTSQS*PPS*PELTS''HUWE1_S2953_GILEEPLPSTSS*EEEDPLAGISLPEGVDPSFLAALPD''IRS2_S384S388_TASEGDGGAAAGAAAAGARPVS*VAGS*PLSPGPVR''KIAA1109_S4304_LFLGDQTINLPTSGPGTPDSIEGVS*QHLSPESSR''KIAA1109_S4308_LFLGDQTINLPTSGPGTPDSIEGVSQHLS*PESSR''KIAA1191_Y33S52_AVSY*DDTLEDPAPMTPPPSDMGS*VPWKPVIPE''KIF1A_S1531_LETAQRPVPEALSPAFSEDSESHGSSSASS*PLSAEGR''KIF2A_S624_ELTVDPTAAGDVRPIMHHPPNQIDDLETQWGVGSS*PQR''KIF2A_T617_ELTVDPTAAGDVRPIMHHPPNQIDDLET*QWGVGSSPQR''LCP2_S410_NFPLPLPNKPRPPS*PAEEENSLNEEWYVSYITRPEAEAA''LCP2_S417_NFPLPLPNKPRPPSPAEEENS*LNEEWYVSYITRPEAEAA''LCP2_T428_NFPLPLPNKPRPPSPAEEENSLNEEWYVSYIT*RPEAEAA''LCP2_Y426_NFPLPLPNKPRPPSPAEEENSLNEEWYVSY*ITRPEAEAA''MAP1A_S2408_SSRPDTLLS*PEQPVCPAGGSGGPPSSASPEVEAGPQG''MAP1A_S2408S2419_SSRPDTLLS*PEQPVCPAGGS*GGPPSSASPEV''MAP1A_S2408S2419S2427_SSRPDTLLS*PEQPVCPAGGS*GGPPSS''MAP1A_S2408S2427_SSRPDTLLS*PEQPVCPAGGSGGPPSSAS*PEV''MAP1A_S2425S2427_SSRPDTLLSPEQPVCPAGGSGGPPSS*AS*PEV''MAP1A_S2427_SSRPDTLLSPEQPVCPAGGSGGPPSSAS*PEVEAGPQG''MAP1A_T2042_SLQSDTPTFSYAALAGPT*VPPRPEPGPSMEPSLTPPA''MAP1A_T2058_SLQSDTPTFSYAALAGPTVPPRPEPGPSMEPSLT*PPA''MAP1B_S1396S1400S1408_VLS*PLRS*PPLIGSES*AYESFLSADD''MAP4K4_S324S326_DETEYEYS*GS*EEEEEEVPEQEGEPSSIVNVPG''MAP4K4_S326_DETEYEYSGS*EEEEEEVPEQEGEPSSIVNVPGESTLR''MAP4K4_S341S342_DETEYEYSGSEEEEEEVPEQEGEPS*S*IVNVPG''MARCKS_S118T120_EAPAEGEAAEPGS*PT*AAEGEAASAASSTSSPK''MARCKS_S118T120S132_EAPAEGEAAEPGS*PT*AAEGEAASAASS*''MARCKS_S128S131S132_EAPAEGEAAEPGSPTAAEGEAAS*AAS*S*''MARCKS_S131S132_EAPAEGEAAEPGSPTAAEGEAASAAS*S*TSSPK''MARCKS_T120S128_EAPAEGEAAEPGSPT*AAEGEAAS*AASSTSSPK''MARCKS_T120S131S132_EAPAEGEAAEPGSPT*AAEGEAASAAS*S*''MARCKS_T120S132_EAPAEGEAAEPGSPT*AAEGEAASAASS*TSSPK''MARCKS_T143S145S147_EAPAEGEAAEPGSPTAAEGEAASAASSTSS''MEAF6_S122_REPGSGTES*DTSPDFHNQENEPSQEDPEDLDGSVQGVK''MEAF6_S125_REPGSGTESDTS*PDFHNQENEPSQEDPEDLDGSVQGVK''MEAF6_S125S136_REPGSGTESDTS*PDFHNQENEPS*QEDPEDLDGS''MEAF6_S136_REPGSGTESDTSPDFHNQENEPS*QEDPEDLDGSVQGVK''MEAF6_T124S125_REPGSGTESDT*S*PDFHNQENEPSQEDPEDLDGS''MEAF6_T124S125S136_REPGSGTESDT*S*PDFHNQENEPS*QEDPE''MIA3_S1678_RGPLSQNGSFGPSPVSGGECS*PPLTVEPPVRPLSATLN''MIA3_T1682_RGPLSQNGSFGPSPVSGGECSPPLT*VEPPVRPLSATLN''MICAL1_S786_AEGSDRGPES*PELPTPSENSMPPGLSTPTASQEGAGP''MINK1_S324S326_EETEYEYS*GS*EEEDDSHGEEGEPSSIMNVPGES''MINK1_S324S326S332_EETEYEYS*GS*EEEDDS*HGEEGEPSSIMN''MINK1_S326S332_EETEYEYSGS*EEEDDS*HGEEGEPSSIMNVPGES''MON2_S1182_SFQEILQIVSPVRDS*DKPETPPVVNVPVPVLIGPISGM''MTSS1_S653_GEHSPESPS*VGEGPQGVTSMPSSMWSGQASVNPPLPGP''MTSS1L_S639S649_RLS*LPNTAWGSPS*PEAAGYPGAGAEDEQQQLA''MYLK_S1419_AINVYGTSEPSQES*ELTTVGEKPEEPKDEVEVSDDDEK''MYLK_S1438_AINVYGTSEPSQESELTTVGEKPEEPKDEVEVS*DDDEK''NASP_S127_MENGVLGNALEGVHVEEEEGEKTEDES*LVENNDNIDEEA''NASP_T123_MENGVLGNALEGVHVEEEEGEKT*EDESLVENNDNIDEEA''NASP_T123S127_MENGVLGNALEGVHVEEEEGEKT*EDES*LVENNDN''NCBP3_S30_AEAPAGPALGLPSPEAES*GVDRGEPEPMEVEEGELEIVP''NCOR2_S2413S2420S2432_AKS*PAPGLAS*GDRPPSVSSVHS*EGD''NES_S1492S1496S1498_TALETESQDS*AEPS*GS*EEESDPVSLER''NES_S1496S1498S1502_TALETESQDSAEPS*GS*EEES*DPVSLER''NES_S1496S1498S1506_TALETESQDSAEPS*GS*EEESDPVS*LER''NES_S1496S1502S1506_TALETESQDSAEPS*GSEEES*DPVS*LER''NFIX_S288S318_S*IDDSEMESPVDDVFYPGTGRSPAAGSSQSS*GWP''NFIX_S318_SIDDSEMESPVDDVFYPGTGRSPAAGSSQSS*GWPNDVDA''NOL9_S84T90S97_RPNTATPS*PIPSPT*PASEPES*EPELESASSCH''NOS1AP_S417_SGALPVLCDPTTPKPEDLHSPPLGAGLADFAHPAGS*P''NRBP1_S436_NGIYPLTAFGLPRPQQPQQEEVTSPVVPPS*VKTPTPEP''NRBP1_T429_NGIYPLTAFGLPRPQQPQQEEVT*SPVVPPSVKTPTPEP''NRBP1_T439_NGIYPLTAFGLPRPQQPQQEEVTSPVVPPSVKT*PTPEP''NRBP1_T439T441_NGIYPLTAFGLPRPQQPQQEEVTSPVVPPSVKT*P''NRCAM_S1251S1254Y1258_KEDS*DDS*LVDY*GEGVNGQFNEDGSF''NRCAM_S1251Y1258_KEDS*DDSLVDY*GEGVNGQFNEDGSFIGQYSG''NRCAM_S1251Y1258S1271_KEDS*DDSLVDY*GEGVNGQFNEDGS*F''NRCAM_S1254S1271_KEDSDDS*LVDYGEGVNGQFNEDGS*FIGQYSG''NRCAM_S1254Y1258_KEDSDDS*LVDY*GEGVNGQFNEDGSFIGQYSG''NRCAM_S1254Y1258S1271_KEDSDDS*LVDY*GEGVNGQFNEDGS*F''PACSIN2_S356_PSSTLNVPSNPAQS*AQSQSSYNPFEDEDDTGSTVSE''PACSIN2_S359_PSSTLNVPSNPAQSAQS*QSSYNPFEDEDDTGSTVSE''PACSIN2_S359Y363_PSSTLNVPSNPAQSAQS*QSSY*NPFEDEDDTG''PACSIN2_S377_PSSTLNVPSNPAQSAQSQSSYNPFEDEDDTGSTVS*E''PAK4_S291_GAPSPGVLGPHASEPQLAPPACTPAAPAVPGPPGPRS*PQ''PALM2_T378T379_TVTDVSTIDGNAAELVSGRPVSDT*T*EPSSPEGK''PAXBP1_T68_APGGESLLGPGPSPPSALT*PGLGAEAGGGFPGGAEPGN''PDZD2_S997_VGCYDANDASDEEEFDREGDCISLPGALPGPIRPLS*ED''PKN1_S582_SSRDPPSSPSS*LSSPIQESTAPELPSETQETPGPALCSP''PKN1_S585_SSRDPPSSPSSLSS*PIQESTAPELPSETQETPGPALCSP''PKP4_S1048S1049_SHPSLSTTNQQMSPIIQSVGSTSS*S*PALLGIR''PLCB1_S494S495_LSEQASNTYSDSS*S*MFEPSSPGAGEADTESDDD''PLCB1_S495S501_LSEQASNTYSDSSS*MFEPSS*PGAGEADTESDDD''PLCB1_S500S501_LSEQASNTYSDSSSMFEPS*S*PGAGEADTESDDD''PLCB1_S511_LSEQASNTYSDSSSMFEPSSPGAGEADTES*DDDDDDDD''PLCB1_T509S511_LSEQASNTYSDSSSMFEPSSPGAGEADT*ES*DDD''PLEKHA6_S278_VPGGGEQPAQPNGWQYHSPS*RPGSTAFPSQDGETGG''PLEKHA7_S867S871_TVPLFPHPPVPSLSTSESKPPPQPS*PPTS*PV''PPP1R3F_S233_SPPWAGAGGTGAGDPILDPGLGLGPGQASASS*PDDG''PRKCI_S544_QVVPPFKPNIS*GEFGLDNFDSQFTNEPVQLTPDDDDIV''PRKD3_S213_RLS*NVSLPGPGLSVPRPLQPEYVALPSEESHVHQEPSK''PRRC2A_S342S350_LKFS*DEEDGRDS*DEEGAEGHRDSQSASGEERP''PRRC2A_S342S363S365_LKFS*DEEDGRDSDEEGAEGHRDSQS*AS*''PRRC2A_S342S365_LKFS*DEEDGRDSDEEGAEGHRDSQSAS*GEERP''PRRC2A_S350S365_LKFSDEEDGRDS*DEEGAEGHRDSQSAS*GEERP''PRRC2C_T1498S1500S1503_TPDLSNQNSSDQANEEWET*AS*ESS*''PRRC2C_T1498S1502S1503_TPDLSNQNSSDQANEEWET*ASES*S*''PTPN23_S1576_EEPPVPEAPSSGPPSSS*LELLASLTPEAFSLDSSLR''RAPGEF2_S1277S1281S1285_QAEDTISNASSQLSS*PPTS*PQSS*''RB1_S608S612_DREGPTDHLESACPLNLPLQNNHTAADMYLS*PVRS*''RETREG2_S281S283_NAPPGGDEPLAETES*ES*EAELAGFSPVVDVK''RMDN3_S212_KDS*LDLEEEAASGASSALEAGGSSGLEDVLPLLQQADE''RMDN3_S224_KDSLDLEEEAASGAS*SALEAGGSSGLEDVLPLLQQADE''RMDN3_S224_KDSLDLEEEAASGAS*SALEAGGSSGLEDVLPLLQQADE''RMDN3_S233_KDSLDLEEEAASGASSALEAGGSS*GLEDVLPLLQQADE''RMDN3_S233_KDSLDLEEEAASGASSALEAGGSS*GLEDVLPLLQQADE''SAMD1_S427_EGGTASVATGPDSPS*PVPLPPGKPALPGADGTPFGCPP''SCRIB_T1342S1348_AFAAVPTSHPPEDAPAQPPT*PGPAAS*PEQLS''SEC62_S335S341T343_VGPGNHGTEGS*GGERHS*DT*DSDRREDDR''SEPT4_S101S102_PQAPDLYDDDLEFRPPSRPQS*S*DNQQYFCAPAP''SEPT4_S101S102_PQAPDLYDDDLEFRPPSRPQS*S*DNQQYFCAPAP''SEPT4_S102_PQAPDLYDDDLEFRPPSRPQSS*DNQQYFCAPAPLSPSA''SEPT4_S102Y107_PQAPDLYDDDLEFRPPSRPQSS*DNQQY*FCAPAP''SEPT4_S102Y107_PQAPDLYDDDLEFRPPSRPQSS*DNQQY*FCAPAP''SEPT4_S115_PQAPDLYDDDLEFRPPSRPQSSDNQQYFCAPAPLS*PSA''SEPT4_S115_PQVPEPRPQAPDLYDDDLEFRPPSRPQSSDNQQYFCAPA''SEPT4_Y107_PQAPDLYDDDLEFRPPSRPQSSDNQQY*FCAPAPLSPSA''SEPT4_Y107_PQVPEPRPQAPDLYDDDLEFRPPSRPQSSDNQQY*FCAP''SEPT4_Y107S115_PQAPDLYDDDLEFRPPSRPQSSDNQQY*FCAPAPL''SH3BP5L_S43S44_ETPQGELRPEVVEDEVPRSPVAEEPGGGGSSS*S*''SLC39A8_S275S278_ALPAINGVTCYANPAVTEANGHIHFDNVS*VVS''SLX4_S1608S1610_EIFQYTHQTLDS*DS*EDESQSSQPLLQAPHCQT''SP110_S244S248_EDPQEMPHS*PLGS*MPEIRDNSPEPNDPEEPQEV''SP110_S248S256_EDPQEMPHSPLGS*MPEIRDNS*PEPNDPEEPQEV''SPTBN1_S2160S2161S2164_MAETVDTSEMVNGATEQRTS*S*KES*''SPTBN1_S2160S2161S2165_MAETVDTSEMVNGATEQRTS*S*KESS''SPTBN1_S2161S2164S2165_MAETVDTSEMVNGATEQRTSS*KES*S''SPTBN1_T2155S2164S2165_MAETVDTSEMVNGAT*EQRTSSKES*S''SRRM2_S377T384S387_HGGS*PQPLATT*PLS*QEPVNPPSEASPTR''STON2_S258_FPSWVTFDDNEVSCPLPPVTSPLKPNTPPS*ASVIPDVP''SUPT6H_S73S75S78_GFINDDDDEDEGEEDEGS*DS*GDS*EDDVGHK''SYMPK_T1246T1257S1259_EERSPQT*LAPVGEDAMKT*PS*PAAED''THRAP3_S289_PSPPLSSTSQMGSTLPS*GAGYQSGTHQGQFDHGSGSL''THRAP3_S310_PSPPLSSTSQMGSTLPSGAGYQSGTHQGQFDHGSGSLS''TNIK_S324S326_DETEYEYS*GS*EEEEEENDSGEPSSILNLPGESTL''TRIM28_S596_LASPS*GSTSSGLEVVAPEGTSAPGGGPGTLDDSATIC''TRIO_S1952_MALEDRPSSLLVDQGDSSSPSFNPSDNSLLSS*SSPIDE''TRIO_S1954_MALEDRPSSLLVDQGDSSSPSFNPSDNSLLSSSS*PIDE''UBE2O_S87S102S115_LIHGEDS*DSEGEEEGRGSSGCS*EAGGAGHE''VCP_S197_VVETDPSPYCIVAPDTVIHCEGEPIKREDEEES*LNEVGYD''VCP_T180_VVETDPSPYCIVAPDT*VIHCEGEPIKREDEEESLNEVGYD''VPS13D_S1712S1727_EYLSQSCPS*VSNVEYPDMPRSLPS*HMEEAP''VPS13D_S1724S1727_EYLSQSCPSVSNVEYPDMPRS*LPS*HMEEAP''VPS13D_Y1718S1727_EYLSQSCPSVSNVEY*PDMPRSLPS*HMEEAP''WNK1_S1557_VFPSEITDTVAASTAQSPGMNLSHS*ASSLSLQQAFSEL''ZSCAN18_S66S70_AFAS*PRSS*PAPPDLPTPGSAAGVQQEEPETIPE''ZSCAN18_S69S70_AFASPRS*S*PAPPDLPTPGSAAGVQQEEPETIPE''ZSCAN18_T78S81_AFASPRSSPAPPDLPT*PGS*AAGVQQEEPETIPE'

$proteomics
$transcriptomics
gtca commented 3 years ago

So it seems that there are duplicated feature names in those 2 views (acetylproteomics and phosphoproteomics) indeed.

This is something that has to be fixed in the data input, I believe.

But we should also consider verifying there are no duplicates before training a model in Python, thanks for letting us know!

vinayduggi commented 3 years ago

Hey @gtca .....Yes I will fix the data input for those two view.....thanks for your response....one peculiar thing I would like to mention is yesterday I had checked for duplicate features in the python environment for the above two dataframes for which python says that there are no duplicate entries whereas using the commands you provided in R environment it was showing that there are some duplicated feature names which is little confusing as to what is making the difference here....Hope this will help.......Thank you !