TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
139 stars 20 forks source link

Running EarlGrey with already available RepeatModeler Consensus file #149

Closed Homap closed 1 month ago

Homap commented 1 month ago

Hi Toby,

Thanks very much for the great tool.

Sometime ago, I ran the Repeatmodeler for my set of genomes and I have the classified consensus fasta file. Would it be possible to feed this file to EarlGrey so it doesn't have to run the RepeatModeler step?

Thanks again and best wishes, Homa

TobyBaril commented 1 month ago

Hi Homa,

This is (kind of) possible with a bit of trickery! I'm guessing you would like to run these RepeatModeler libraries through the refinement process as well?

If so, you can trick Earl Grey into running with your existing library. Take this example:

Our command line option would be something like:

earlGrey -g genome.fasta -s species_1 -o output_directory

Now, we have already made a RepeatModeler library, but Earl Grey doesn't know this. We can trick it into thinking it has already made the library by making the output directories and putting the existing library in the correct place with the correct name:

mkdir -p output_directory/species_1_EarlGrey/
mkdir -p output_directory/species_1_EarlGrey/species_1_Database/
mkdir -p output_directory/species_1_EarlGrey/species_1_RepeatModeler/
mkdir -p output_directory/species_1_EarlGrey/species_1_strainer/
mkdir -p output_directory/species_1_EarlGrey/species_1_Curated_Library/
mkdir -p output_directory/species_1_EarlGrey/species_1_RepeatMasker_Against_Custom_Library/
mkdir -p output_directory/species_1_EarlGrey/species_1_RepeatLandscape/
mkdir -p output_directory/species_1_EarlGrey/species_1_mergedRepeats/
mkdir -p output_directory/species_1_EarlGrey/species_1_summaryFiles/

Once you have made the directories that Earl Grey will look for, you need to put the existing RepeatModeler fasta library in the correct place with the correct name:

cp repeatmodeler.fasta output_directory/species_1_EarlGrey/species_1_Database/species_1-families.fa

Once this is done, Earl Grey will detect that this file exists and is consistent with the options supplied (i.e species name with -s and output directory with -o), assume that a run has already had some successful steps and will pick up starting with the BEAT process for your library with the command:

earlGrey -g genome.fasta -s species_1 -o output_directory

I hope this helps!

Homap commented 1 month ago

This is wonderful! Thank you so much!

I just have another question. One of the options of earlGrey is-r, the repeatmasker species. I'm running earlGrey on many algal genomes. For some, I see there is a library with the exact species name in repbase, for some, there is none.

I was wondering if I indicate the species name and the species does not exist, would RepeatMasker fail or simply ignore it?

Thanks again very much!

TobyBaril commented 1 month ago

Hi Homa!

I've just dropped you an email with some more info!