Lioscro / alaska-parse

Automated and friendly RNA-seq analysis (migrated to Parse Server)
http://alaska.caltech.edu
1 stars 0 forks source link

Genomes for mapping #21

Open dangeles opened 5 years ago

dangeles commented 5 years ago

3 genomes are offered for each species. This is the result of soft, hard and not masked genome versions. Not masked and soft-masked genomes are identical for purposes of transcript mapping (repetitive regions are just capitalized in soft masked genomes), so only the soft-masked genome should be offered. Genome names should reflect masking status.

We probably should state that unmasked genomes lead to the least loss of information, but masked genomes minimize the number of reads mapped to repetitive gene regions.

Lioscro commented 5 years ago

The way reference naming works is Alaska simply uses the name of the directory containing the reference files (cdna, cds, bed). This shouldn't be too hard to do (it's just a matter of downloading the right files, placing them in the right directory with the correct name, in this case either indicating unmasked or masked)