bvaldebenitom / SoloTE

GNU General Public License v3.0
28 stars 6 forks source link

Inquiry on TE Classification in soloTE paper #50

Closed trista1115 closed 4 months ago

trista1115 commented 4 months ago

Hi, Dr. Braulio Valdebenito-Maturana

I apologize for reaching out this way, but I was unable to find your valid email address and had to ask my questions here. I recently came across your publication, [SoloTE for improved analysis of transposable elements in single-cell RNA-Seq data using locus-specific expression], and found your method for classifying TEs as young or old based on their percentage of divergence very insightful.

I am particularly interested in understanding how you applied the scTE tool to classify TEs into young or old categories. Specifically, I am curious about:

  1. Any additional parameters or settings in scTE that are important for accurately classifying TEs based on divergence.

From my understanding, scTE doesn’t quantify locus-specific TE. If this is the case, it seems that a single TE could be classified into both young and old categories, as shown below: repName type L1M Young L1M Old L1M1 Young L1M1 Old

Your insights would be incredibly valuable to my research, and I would greatly appreciate any detailed information or guidance you could provide on this question. Thank you very much for your time and assistance. Best regards,

Trista my email: jingsi_tang@163.com (prefer)

bvaldebenitom commented 4 months ago

Hi @trista1115

I'm no longer associated with the email address on the paper nor with the institution mentioned there.

Regarding the scTE categories, you are correct that it doesn't quantify locus-specific TEs, and that's one of the motivations for developing SoloTE. The "young" and "old" groups and the respective comparisons in the papers were done using a ground truth data set, and then simulating reads from those.

For example, the divergence reported in RepeatMasker was used to classify TEs as old (>10) or as young (<10), then, for each group, reads were simulated, aligned and processed with scTE and SoloTE. This is how I knew the category of each TE. It wasn't the other way around, because based on identifier alone or repeat name, I don't think you can systematically assign them to each category.

Hope this helps.

trista1115 commented 4 months ago

Thank you so much for your detailed explanation. Your response clarified a lot for me and is incredibly helpful.

bvaldebenitom commented 4 months ago

You are welcome. Let me know if you have further questions. I'm closing this issue, but feel free to reopen or start a new one.