Closed sherlock0088 closed 1 month ago
Hi Yupeng,
There are a few ways you can do this. The simplest is to cat the libraries into a single file and then merge the sequences using cd-hit
with the wicker family definition:
cat *.fa > merged_libraries.fa
cd-hit-est -d 0 -aS 0.8 -c 0.8 -G 0 -g 1 -b 500 -r 1 -i merged_libraries.fa -o merged_libraries.clstrd.fa
There are some caveats to this though, such as the appearance of chimeric consensus sequences in some cases. Other methods involved network analysis following aligning each TE consensus to every other TE consensus and deciding on appropriate cutoffs based on alignment scores over lengths etc. I would recommend having a deeper look into the available options before choosing!
Hi,
I have finished running Earllgreylibconstruct for each chromosome. My next question is how to merge the output and run the following step. Cat the .families.fa.strained files together?
Best, Yupeng