chess-genome / chess

Comprehensive Human Expressed SequenceS
http://ccb.jhu.edu/chess/
GNU General Public License v3.0
15 stars 2 forks source link

Fasta file for transcript sequences #5

Closed tbrittoborges closed 2 years ago

tbrittoborges commented 2 years ago

Hi @alevar ,

Could you provide me the link to the genomic sequence (GRCh38.p8 or GRCh38.p13) FASTA file you used to compute the transcript abundances as in the manuscript?

Thanks in advance

tbrittoborges commented 2 years ago

I think this is the link: http://ccb.jhu.edu/chess/data/hg38_p8.fa.gz, can you confirm?

alevar commented 2 years ago

Hi @tbrittoborges,

Yes, the file you found is correct - for that part of the analysis we've used patch 8 of the GRCh38 reference genome sequence.

Please let me know if I can help with anything else!

tbrittoborges commented 2 years ago

Thank you! Do you have the equivalent files for p13?

alevar commented 2 years ago

We are actively working on a new release of the catalog which will be based on patch 12 of GRCh38. However, as was noted by many (including ourselves), alternative haplotypes and duplicated sequences in patches and fixes present serious issues for alignment, transcriptome assembly and quantification. In the currently available version of the CHESS catalog we provide an "assembly" subset, which preserves only records on the primary, random and unplaced scaffolds. Going forward, our intention is not to use "alt", "fix" and "patch" contigs of GRCh38 and annotate primary, random and unplaced scaffolds only. Without those additional contigs, the version of GRCh38 should not matter either.

Hope this information is helpful! Please let us know if we may be of help with any other questions!

Ales