tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1
stars
1
forks
source link
tiny-count: GFF validation and reliability improvements #235
A GFF validation routine will be added which can be called at either pipeline startup or tiny-count startup. Validation will include:
Ensuring that there is overlap between the chromosome identifiers in the user's GFF and their bowtie indexes OR reference genome OR sam alignment files. We will need to develop some heuristic approaches for the latter two inputs in order to avoid consuming gigabytes of data.
Ensuring that each feature has strand information (as long as the feature isn't describing an entire chromosome)
Ensuring that each feature defines either an ID or gene_id attribute
The following changes will be made to the ReferenceTables class to better support Ensembl GFFs:
Annotations describing entire chromosomes will be skipped
If a feature lacks an ID or gene_id attribute but has a Parent attribute, the feature will be assigned the parent's ID so that its matches and aliases are merged with the root parent's. This is how discontinuous features are currently handled; this change would just introduce a special case for the missing ID/gene_id
A GFF validation routine will be added which can be called at either pipeline startup or tiny-count startup. Validation will include:
ID
orgene_id
attributeThe following changes will be made to the ReferenceTables class to better support Ensembl GFFs:
ID
orgene_id
attribute but has aParent
attribute, the feature will be assigned the parent's ID so that its matches and aliases are merged with the root parent's. This is how discontinuous features are currently handled; this change would just introduce a special case for the missingID
/gene_id