bvaldebenitom / SoloTE

GNU General Public License v3.0
23 stars 6 forks source link

Question about output #40

Closed mousepixels closed 2 weeks ago

mousepixels commented 5 months ago

Is there an explanation on the output files/directories:

For example what is a legacytes? Why does the locustes matrix not have any TEs, only genes? Just a general explanation would be beneficial.

Also, I need clarification on the stats output. What exactly does locus-specific mean? Don't the reads map to specific loci, which are then summed to subfamilies? Or does this only mean that no subfamily is represented by only one specific-TE?

TE detected UMIs are distributed as follows: Locus-specific TEs: 0 UMIs (0.0%). Subfamily TEs: 7594114 (100.0%).

Thank you!!

bvaldebenitom commented 3 months ago

Hi @mousepixels,

sorry for the late answer.

There is a graphical description here: https://github.com/bvaldebenitom/SoloTE/issues/30#issuecomment-1721876808

Briefly, Legacy TEs is the standard output of SoloTE which combines locus-specific expression and subfamily expression. Locus-specific expression correspond to reads aligning uniquely to TEs in the genome, which we can unambiguously assess their genomic location. Of the remaining reads, those aligning to more than one location, are counted towards the subfamily level, because we cannot determine their genomic location.

You are probably not getting any locus-specific TE because the BED file probably doesn't contain the keyword "chr". Did you generate the TE BED file using our tool, or did you get it in a different way?

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 10 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 14 days since being marked as stale.