Open dangeles opened 5 years ago
The way reference naming works is Alaska simply uses the name of the directory containing the reference files (cdna, cds, bed). This shouldn't be too hard to do (it's just a matter of downloading the right files, placing them in the right directory with the correct name, in this case either indicating unmasked
or masked
)
3 genomes are offered for each species. This is the result of soft, hard and not masked genome versions. Not masked and soft-masked genomes are identical for purposes of transcript mapping (repetitive regions are just capitalized in soft masked genomes), so only the soft-masked genome should be offered. Genome names should reflect masking status.
We probably should state that unmasked genomes lead to the least loss of information, but masked genomes minimize the number of reads mapped to repetitive gene regions.