epi2me-labs / wf-single-cell

Other
75 stars 39 forks source link

Transcripts with class_code "u" in gene expression matrix #110

Closed kataksk closed 2 months ago

kataksk commented 5 months ago

Ask away!

Thank you very much for creating a great workflow!

I am doing my analysis using non-model organisms with incomplete gene models. gffcompare seems to classify many transcripts as class_code "u". Looking at the final results (e.g. gene_processed_feature_bc_matrix and transcript_processed_feature_bc_matrix), transcripts classified as class_code "u" etc seem to be ignored in the gene expression matrix. Is this interpretation correct?

cjw85 commented 5 months ago

Reads are "discarded" (in the sense that they don't contribute to the matrix) that have the class codes: 'i', 'y', 'p', or 's'. See here.

nrhorner commented 2 months ago

@kataksk reads are also currently discarded if they have a class code of 'u' as the workflow only counts transcripts with a match to the supplied annotation. Closing this now