broadinstitute / epi-SHARE-seq-pipeline

Epigenomics Program pipeline to analyze SHARE-seq data.
MIT License
17 stars 3 forks source link

New IGVF transcriptome reference #85

Closed emattei closed 1 year ago

emattei commented 1 year ago

https://www.synapse.org/#!Synapse:syn39048501

Gencode v43 for human and M32 for mouse

emattei commented 1 year ago

Human Completed Mouse needs new tss file and new ccREs. Index should be ready.

sidwekhande commented 1 year ago

This affects DORCs wf - we have a .Rdata file that contains hg19/hg38/mm10 TSSRanges in granges objects. This needs to be modified either to include the new TSSRanges or pass bed file as input.

sidwekhande commented 1 year ago

TSS bed files were created using transcripts from TxDb.mm39 and TxDb.hg38 R packages.

hg38: gs://broad-buenrostro-pipeline-genome-annotations/GRCh38/genes_annotations/hg38.TxDb_transcripts.TSS.bed mm39: gs://broad-buenrostro-pipeline-genome-annotations/mm39/gene-annotations/mm39.TxDb_transcripts.TSS.bed

nchernia commented 1 year ago

How does this compare with the hg38 TSS bed file we already have? (Are we not going to use it anymore?)

On Fri, May 12, 2023 at 12:15 PM Siddarth Wekhande @.***> wrote:

TSS bed files were created using transcripts from TxDb.mm39 and TxDb.hg38 R packages.

hg38: gs://broad-buenrostro-pipeline-genome-annotations/GRCh38/genes_annotations/hg38.TxDb_transcripts.TSS.bed mm39: gs://broad-buenrostro-pipeline-genome-annotations/mm39/gene-annotations/mm39.TxDb_transcripts.TSS.bed

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/epi-SHARE-seq-pipeline/issues/85#issuecomment-1545982099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW6GNGPNNJENMEIVOELXFZO35ANCNFSM6AAAAAAXERPVCY . You are receiving this because you were assigned.Message ID: @.***>

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Senior Scientist | Gene Regulation Observatory Broad Institute of MIT and Harvard

emattei commented 1 year ago

We haven't swapped it yet on the genome TSS. It's reads and it needs some more testing but at least everyone knows where it is

nchernia commented 1 year ago

Are this packages consistent with the gtfs from synapse?

On Fri, May 12, 2023 at 12:39 PM Eugenio Mattei @.***> wrote:

We haven't swapped it yet on the genome TSS. It's reads and it needs some more testing but at least everyone knows where it is

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/epi-SHARE-seq-pipeline/issues/85#issuecomment-1546010766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW3WUMHBQGOYMSCRC2TXFZRS3ANCNFSM6AAAAAAXERPVCY . You are receiving this because you were assigned.Message ID: @.***>

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Senior Scientist | Gene Regulation Observatory Broad Institute of MIT and Harvard

sidwekhande commented 1 year ago

It depends on what you mean by "consistent". These are published packages that define the hg38 and mm39 transcripts, exons, and genes. We could define that information from the synapse gtf files but there are a lot of decisions to make regarding how to select the right transcripts/genes, and then test them.

nchernia commented 1 year ago

These are now completed.

Human hg38 with v43 GTF gs://broad-buenrostro-pipeline-genome-annotations/IGVF_human_v43/Homo_sapiens_genome_files_hg38_v43.tsv

Mouse mm39 with v32 GTF gs://broad-buenrostro-pipeline-genome-annotations/IGVF_mouse_v32/Mus_musculus_genome_files_mm39_v32.tsv

sidwekhande commented 1 year ago

currently, no cCREs are present for mm39. A possible solution would be to use a liftOver?

emattei commented 1 year ago

makes sense to me

sidwekhande commented 1 year ago

mm39 ccre bed file: gs://broad-buenrostro-pipeline-genome-annotations/mm39/mm39_ccre_liftover_resized.bed

This file was created using UCSC liftOver from mm10 (mm10 ccre bed file) to mm39. 8 coordinates failed to liftover, and 39 coordinates had to be resized to 300bp.