LiuzLab / AI_MARRVEL

AI-MARRVEL (AIM) is an AI system for rare genetic disorder diagnosis
GNU General Public License v3.0
8 stars 5 forks source link

job separation by chromosome #63

Closed hyunhwan-bcm closed 1 month ago

hyunhwan-bcm commented 1 month ago

Successfully tested by comparing with nextflow_conversion, yielding identical results (after sorting both) with a precision of 1e-10. A sample with 724,816 variants was used and the running time was 5 mins.

Report

nextflow_conversion - https://gistcdn.githack.com/hyunhwan-bcm/fab1c7755bce8806f06d158332c86286/raw/index.html the current PR - https://gistcdn.githack.com/hyunhwan-bcm/c62d2ae1189f84f311962a9be3a5d10e/raw/index.html

Timeline

the current PR - https://gistcdn.githack.com/hyunhwan-bcm/c94b9027c045277b59fd5aa5c4dde7c9/raw/index.html

hyunhwan-bcm commented 1 month ago

it got fixed

found some difference between nextflow_conversion in investigation

@@,Unnamed: 0,diffuse_Phrank_STRING,hgmdSymptomScore,...,hgmdSymptomSimScore,GERPpp_RS,gnomadAF,gnomadAFg,LRT_score,LRT_Omega,phyloP100way_vertebrate,gnomadGeneZscore,...,IMPACT,CADD_phred,CADD_PHRED,DANN_score,REVEL_score,fathmm_MKL_coding_score,conservationScoreGnomad,conservationScoreOELof,Polyphen2_HDIV_score,Polyphen2_HVAR_score,SIFT_score,zyg,FATHMM_score,M_CAP_score,MutationAssessor_score,ESP6500_AA_AF,...,CLASS,phrank,isB/LB,...,predict,min_ranking,max_ranking,...,identifier,origId,varId_dash,geneSymbol,geneEnsId,rsId,HGVSc,HGVSp,...,confidence (nd),ranking (nd),confidence level (nd),...,confidence (recessive),ranking (recessive),recessive var2,...,confidence level (nd recessive),ranking (nd recessive),nd recessive var2
→,1-100573634-A-T,0.8484286691385579→0.8985805389837482,0.0,...,0.386988942334944,-11.1→-10.9,0.0,0.0,0.0,0.0,-14.068→-4.393,1.0075,...,1.0,0.001,4.639,0.054416449317324→0.1128215789224949,0.002→0.007,6e-05→0.00064,2.0,1.0,0.0,0.0,1.0,1.0,9.66→7.75,0.000339,-3.395→-2.645,0.0,...,0.0,4.386657289640127→4.386657289640128,0,...,0.0,42→116,7562,...,1,1_100573634_A_T,1-100573634-A-T,SASS6,ENSG00000156876,-,ENST00000462159.1:n.1003-74T>A,-,...,0.0,566→577,Unsolved,...,-1.0,99999,NA,...,Unsolved,99999,NA
→,1-100573634-A-T,0.8484286691385579→0.8985805389837482,0.0,...,0.386988942334944,-11.1→-10.9,0.0,0.0,0.0,0.0,-14.068→-4.393,1.0075,...,1.0,0.001,4.639,0.054416449317324→0.1128215789224949,0.002→0.007,6e-05→0.00064,2.0,1.0,0.0,0.0,1.0,1.0,9.66→7.75,0.000339,-3.395→-2.645,0.0,...,0.0,4.386657289640127→4.386657289640128,0,...,0.0,42→116,7562,...,1,1_100573634_A_T,1-100573634-A-T,SASS6,ENSG00000156876,-,ENST00000535161.1:c.361-74T>A,-,...,0.0,566→577,Unsolved,...,-1.0,99999,NA,...,Unsolved,99999,NA
→,1-100661988-GAAAAAAA-GAA,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,1.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100661987_GAAAAAAA_GAA,1-100661988-GAAAAAAA-GAA,DBT,ENSG00000137992,rs752915898,ENST00000370132.4:c.1282-14_1282-10del,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,...,Unsolved,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T
→,1-100661988-GAAAAAAA-GAA,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,1.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100661987_GAAAAAAA_GAA,1-100661988-GAAAAAAA-GAA,DBT,ENSG00000137992,rs752915898,ENST00000370132.4:c.1282-16_1282-10del,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,...,Unsolved,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T
→,1-100661988-GAAAAAAA-GAA,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,1.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100661987_GAAAAAAA_GAA,1-100661988-GAAAAAAA-GAA,RP11-305E17.7,ENSG00000271415,rs752915898,-,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,...,Unsolved,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T
→,1-100661988-GAAAAAAA-GAA,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,1.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100661987_GAAAAAAA_GAA,1-100661988-GAAAAAAA-GAA,RP11-305E17.7,ENSG00000271415,rs752915898,-,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,...,Unsolved,10553→10554,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T
→,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,2.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100675881_TAAGAAGAAGAAGAAGAAGAAG_T,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,BRI3P1,ENSG00000225169,rs145600331,-,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100661988-GAAAAAAA-GAA,...,Unsolved,10553→10554,1-100661988-GAAAAAAA-GAA
→,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,2.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100675881_TAAGAAGAAGAAGAAGAAGAAG_T,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,DBT,ENSG00000137992,rs145600331,-,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100661988-GAAAAAAA-GAA,...,Unsolved,10553→10554,1-100661988-GAAAAAAA-GAA
→,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,0.8844450844162857→0.9380785846533636,0.0,...,0.386988942334944,2.0418650286041187→2.1999319018404906,0.0,0.0,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.9906,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,2.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,2.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,5.123814380349511,0,...,0.0,42→116,7562,...,1,1_100675881_TAAGAAGAAGAAGAAGAAGAAG_T,1-100675882-TAAGAAGAAGAAGAAGAAGAAG-T,DBT,ENSG00000137992,rs145600331,ENST00000370132.4:c.1017+348_1017+368del,-,...,0.0,7562,Unsolved,...,0.0,10553→10554,1-100661988-GAAAAAAA-GAA,...,Unsolved,10553→10554,1-100661988-GAAAAAAA-GAA
→,1-100733207-G-GT,0.2978688937686742→0.3069327298909689,0.0,...,0.318182103000133,2.0418650286041187→2.1999319018404906,0.0680509,0.0680509,0.1248318352234823→0.1847117112676056,1.6412083288859245→0.533333323943662,2.208126482213439→2.623349693251533,0.30262,...,1.0,16.63697120271033→15.767208588957056,7.524767655746509→7.219155397390272,0.8366118269851937→0.8214278370350678,0.2322019964768056→0.2349480519480519,0.509853043478261→0.4692662576687116,1.0,1.0,0.459→0.4185,0.173→0.188,0.096→0.079,1.0,0.79→0.74,0.061870585975024→0.0311653737373737,1.445→1.455,0.0,...,0.0,0.0,0,...,0.0,42→116,7562,...,1,1_100733207_G_GT,1-100733207-G-GT,RP11-305E17.6,ENSG00000224616,rs60634058,-,-,...,0.0,7562,Unsolved,...,-1.0,99999,NA,...,Unsolved,99999,NA