4ureliek / TEanalysis

Analysis of TE contribution to features (transcripts or simple features). Includes utils to test enrichment.
MIT License
24 stars 11 forks source link

If the TE_pipeline is fit for plants #5

Closed zhangaicen closed 5 months ago

zhangaicen commented 3 years ago

Dear Kapusta, Recently I tried to conduct some TE analysis in rice, and luckily found the scripts you wrote, maybe these scripts were written for mouse or human genome, but are they also can be used for plants such as rice?

Thanks. Aicen

4ureliek commented 3 years ago

It will work on anything as long as the file formats are compatible; however it is not optimized and will take a long time on very large genomes.

zhangaicen commented 3 years ago

Hello, I checked my input file, but my TE annotation file was a bed file downloaded from related website as below: image So I can not use the TE_pipeline according to this TE file, yes?

And I also have some problems in getting the over represented TE families. While I want to caculate the fold change and p-value of the over represented TE families manually , the %_masked value in set and in genome you metioned in "READ THE OUTPUTS" should be used, for example, for the LTR family, %_masked in set means "(the total length of peaks overlapped by LTR TEs)/ (the total length of LTR TEs), %_masked in genome means "(the total length of LTR TEs)/(the total length of all TEs that can overlap with peaks), is it right? I can't understand this point very well, could you please help to give some detailed explanation and examples?

Thank you for you patience.

4ureliek commented 1 year ago

Hi, I am sorry, I totally missed that you had answered here, my apologies. In general, do not hesitate to ask again!

Your bed file was indeed not exactly compatible with the script. But you could probably have modified it to mimic a repeat masker output file since you have the essential info in your file.

Hope you managed to get the info you needed!