PavlidisLab / rnaseq-pipeline

RNA-seq pipeline for raw sequence alignment and transcript/gene quantification.
The Unlicense
22 stars 4 forks source link

Handle SRA experiments with multiple lanes mapped on distinct runs #94

Open arteymix opened 4 months ago

arteymix commented 4 months ago

Example: https://www.ncbi.nlm.nih.gov/sra/?term=SRX19303543

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR23362510,2023-03-04 00:13:53,2023-02-07 15:21:34,33757271,4287173417,0,127,1642,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362510/SRR23362510.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,E1D7908479DB68AC5BF2D02363843723,74BBC66805FB82F94EB3452E14DF9B20
SRR23362511,2023-03-04 00:13:55,2023-02-07 15:07:32,33582785,4265013695,0,127,1645,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362511/SRR23362511.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,F9EBCCF2EE039048F7DF04362D0B9A7B,1EE5D4D071EDB9940102EC1A47C2012E
SRR23362512,2023-03-04 00:13:55,2023-02-07 15:13:12,33586989,4265547603,0,127,1635,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362512/SRR23362512.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,C6294D9108B441B431D0A79E3FD38AB1,F76192753E8D0311E3BD7E2919B72AF1
SRR23362513,2023-03-04 00:13:55,2023-02-07 15:17:01,33427011,4245230397,0,127,1631,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362513/SRR23362513.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,3967C30F1B279DC47ABA8FFBBA9ADF75,D62A8FB4F31C612652A6E4F8B28A56E2
SRR23362514,2023-03-04 00:13:55,2023-02-07 15:24:06,57117283,7253894941,0,127,2899,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362514/SRR23362514.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,2AC482F669A1A6473F9F344E3A2C240F,342BE1E8386C569711B667C03A3D1184
SRR23362515,2023-03-04 00:13:55,2023-02-07 15:29:43,57138159,7256546193,0,127,2932,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362515/SRR23362515.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,27DECC599ED381BF772E39872B0639F7,66155FA1849CF725DA6A2C2F1EB9D81E
SRR23362516,2023-03-04 00:13:55,2023-02-07 15:28:47,57071250,7248048750,0,127,2914,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362516/SRR23362516.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,2C3181C595DAA7686DB8AD836B9FD9E7,539051CB6A33D15E2F4BBC766FFBA7C2
SRR23362517,2023-03-04 00:13:55,2023-02-07 15:31:39,56810433,7214924991,0,127,2891,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362517/SRR23362517.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,FBA981C516A70A4F42107649F7056FC1,89575037DD543FD8F09124BF8E6DCFAF
arteymix commented 4 months ago

These can generally safely be concatenated (even as gzip!) before being processed further.