TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
139 stars 20 forks source link

Merged the output of earlgreylibconstruct #148

Closed sherlock0088 closed 1 month ago

sherlock0088 commented 1 month ago

Hi,

I have finished running Earllgreylibconstruct for each chromosome. My next question is how to merge the output and run the following step. Cat the .families.fa.strained files together?

Best, Yupeng

TobyBaril commented 1 month ago

Hi Yupeng,

There are a few ways you can do this. The simplest is to cat the libraries into a single file and then merge the sequences using cd-hit with the wicker family definition:

cat *.fa > merged_libraries.fa
cd-hit-est -d 0 -aS 0.8 -c 0.8 -G 0 -g 1 -b 500 -r 1 -i merged_libraries.fa -o merged_libraries.clstrd.fa

There are some caveats to this though, such as the appearance of chimeric consensus sequences in some cases. Other methods involved network analysis following aligning each TE consensus to every other TE consensus and deciding on appropriate cutoffs based on alignment scores over lengths etc. I would recommend having a deeper look into the available options before choosing!