davemcg / EiaD_build

Snakemake pipeline to create EiaD dataset for eyeIntegration
3 stars 1 forks source link

Missing files: metadata #2

Closed davemcg closed 5 years ago

davemcg commented 5 years ago
  1. core_tight
    > head(core_tight)
    study_accession                                           study_title
    1       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
    2       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
    3       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
    4       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
    5       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
    6       SRP012682 Genotype-Tissue Expression (GTEx) Common Fund Project
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              study_abstract
    1 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    2 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    3 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    4 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    5 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    6 Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study. GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues, and for testing them in the lab. Until now, no project has analyzed genetic variation and expression in as many tissues from the same person in... (for more see dbGaP study page.)
    sample_accession
    1        SRS374904
    2        SRS374911
    3        SRS333043
    4        SRS333036
    5        SRS374908
    6        SRS374769
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       sample_attribute
    1            gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-QVJO-0011-R1A-SM-2S1QI || submitted sample id: GTEX-QVJO-0011-R1A-SM-2S1QI || submitted subject id: GTEX-QVJO || gap_sample_id: 860333 || gap_subject_id: 590977 || sex: female || body site: Brain - Hippocampus || histological type: Brain || analyte type: RNA:Total RNA || is tumor: No || molecular data type: Allele-Specific Expression || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    2                                                                        gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-N7MT-1226-SM-2TC6K || submitted sample id: GTEX-N7MT-1226-SM-2TC6K || submitted subject id: GTEX-N7MT || gap_sample_id: 859937 || gap_subject_id: 590921 || sex: female || body site: Brain - Cerebellum || histological type: Brain || analyte type: RNA:Total RNA || is tumor: No || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    3 gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-N7MT-0011-R10A-SM-2I3E1 || submitted sample id: GTEX-N7MT-0011-R10A-SM-2I3E1 || submitted subject id: GTEX-N7MT || gap_sample_id: 735835 || gap_subject_id: 590921 || sex: female || body site: Brain - Frontal Cortex (BA9) || histological type: Brain || analyte type: RNA:Total RNA || is tumor: No || molecular data type: Allele-Specific Expression || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    4                                            gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-N7MS-0011-R3a-SM-2HMKD || submitted sample id: GTEX-N7MS-0011-R3a-SM-2HMKD || submitted subject id: GTEX-N7MS || gap_sample_id: 735815 || gap_subject_id: 590920 || sex: male || body site: Brain - Anterior cingulate cortex (BA24) || histological type: Brain || analyte type: RNA:Total RNA || is tumor: No || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    5                   gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-RUSQ-0526-SM-2TF72 || submitted sample id: GTEX-RUSQ-0526-SM-2TF72 || submitted subject id: GTEX-RUSQ || gap_sample_id: 860647 || gap_subject_id: 678159 || sex: male || body site: Heart - Left Ventricle || histological type: Heart || analyte type: RNA:Total RNA || is tumor: No || molecular data type: Allele-Specific Expression || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    6                       gap_accession: phs000424 || submitter handle: GTEx || biospecimen repository: GTEx || study name: Genotype-Tissue Expression (GTEx) || study design: Cross-Sectional || biospecimen repository sample id: GTEX-N7MS-0426-SM-2YUN6 || submitted sample id: GTEX-N7MS-0426-SM-2YUN6 || submitted subject id: GTEX-N7MS || gap_sample_id: 859931 || gap_subject_id: 590920 || sex: male || body site: Muscle - Skeletal || histological type: Muscle || analyte type: RNA:Total RNA || is tumor: No || molecular data type: Allele-Specific Expression || molecular data type: RNA Seq (NGS) || gap_consent_code: 1 || gap_consent_short_name: GRU
    Tissue                                 Sub_Tissue Origin
    1  Brain                       Brain - Hippocampus  Tissue
    2  Brain                        Brain - Cerebellum  Tissue
    3  Brain              Brain - Frontal Cortex (BA9)  Tissue
    4  Brain  Brain - Anterior cingulate cortex (BA24)  Tissue
    5  Heart                    Heart - Left Ventricle  Tissue
    6 Muscle                         Muscle - Skeletal  Tissue
davemcg commented 5 years ago

@vinay-swamy , could you get this done ASAP? I'm stuck on many steps until this is done. Implement this as a separate rule.

vinay-swamy commented 5 years ago

yup working on it now

vinay-swamy commented 5 years ago

@davemcg should work now

davemcg commented 5 years ago

Now done (found some typos). Please follow this notation style (separate lines for each input/output with explicit names and I give all inputs and outputs as arguments). You made a typo with your hard coded output not matching the Snakefile.


# output sample metadata and gene/tx lists for eyeIntegration
rule make_meta_info:
    input:
        expand('results/smoothed_filtered_tpms_{level}.csv',level=['gene','transcript'])
    params:
        working_dir = config['working_dir']
    output:
        metadata = 'results/core_tight.Rdata',
        tx_names = 'results/tx_names.Rdata',
        gene_names = 'results/gene_names.Rdata'
    shell:
        '''
        module load R
        Rscript {config[scripts_dir]}/make_meta_info.R \
          {config[sampleFile]} \
          {ref_GTF_basic} \
          {config[sqlfile]} \
          {input} \
          {params.working_dir} \
          {output.metadata} \
          {output.tx_names} \
          {output.gene_names}
        '''
davemcg commented 5 years ago

https://github.com/davemcg/eyeIntegration_auto_build/commit/1378b45cf9c93833998be0cd79be72200248da83

davemcg commented 5 years ago

@vinay-swamy I've re-opened as:

I'll fix this

davemcg commented 5 years ago

https://github.com/davemcg/eyeIntegration_auto_build/commit/dcd626098e96c58dfa75032217dd53c1e5e55ffb