CompSynBioLab-KoreaUniv / FunGAP

FunGAP: fungal Genome Annotation Pipeline
108 stars 32 forks source link

Masking genome #79

Open Hoberti opened 2 years ago

Hoberti commented 2 years ago

Hello,

I have a question about FUNGAP running, is there any way to give FUNGAP a previously masked genome to run it and avoid this step? Or give to FUNGAP a pre-made custom repeat library to run with RepeatMasker?

Thanks in advance

Héctor

mbnmbn00 commented 2 years ago

Yes, that's possible.

Let's say your output directory is fungap_out (--output_dir option).

1) Make the library directory

mkdir fungap_out/repeat_modeler_out/RM

2) Put your library file into the directory and name it consensi.fa.classified

cp $YOUR_LIBRARY_FILE fungap_out/repeat_modeler_out/RM/consensi.fa.classified

Then, run the FunGAP. It will recognize the consensi.fa.classified file and skip the RepeatModeler step.

To use your masked assembly

You may just feed your masked assembly as an input assembly along with your library file. The Maker will mask repeat sequences again using RepeatMasker. Otherwise, I cannot think of an easy way to do it.

But let me know if you really want not to mask any FunGAP-detected repeat sequences. Let's find a way together.

Hoberti commented 2 years ago

Thank you for the response, that's works perfectly.

I find that masking could be improved if we use specific tools for constructing a custom repeats library using tools like TransposonPSI,LTR harvest, etc. Can I suggest a specific wrapper for obtaining a custom library of TE previous to masking?

Héctor

mbnmbn00 commented 2 years ago

Yes. I can do that. But it will take some time! I will let you know when it's finished!

Hoberti commented 2 years ago

Perfect, if you are interested I could help with some premade wrapper that I have. hoberti@inia.org.uy

LiZhihua1982 commented 1 year ago

Hi, if I use (RepeatMasker) lizhihua@lizhihua-T640:/media/lizhihua/data/Fungi/HR3_annotation2$ RepeatMasker -species fugu HR3_assembly_flye.fasta RepeatMasker version 4.1.1 Search Engine: NCBI/RMBLAST [ 2.2.27+ ]

Using Master RepeatMasker Database: /home/lizhihua/miniconda3/envs/RepeatMasker/share/RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Dfam withRBRM Version : 3.2 Date : 2020-07-02 Families : 51,78 The result is HR3_assembly_flye.fasta.masked,HR3_assembly_flye.fasta.cat,HR3_assembly_flye.fasta.out,HR3_assembly_flye.fasta.tbl. I do not understand which file should be changed to consensi.fa.classified. I found the and size and format are very different.

Yes, that's possible.

Let's say your output directory is fungap_out (--output_dir option).

1. Make the library directory
mkdir fungap_out/repeat_modeler_out/RM
2. Put your library file into the directory and name it `consensi.fa.classified`
cp $YOUR_LIBRARY_FILE fungap_out/repeat_modeler_out/RM/consensi.fa.classified

Then, run the FunGAP. It will recognize the consensi.fa.classified file and skip the RepeatModeler step.

To use your masked assembly

You may just feed your masked assembly as an input assembly along with your library file. The Maker will mask repeat sequences again using RepeatMasker. Otherwise, I cannot think of an easy way to do it.

But let me know if you really want not to mask any FunGAP-detected repeat sequences. Let's find a way together.

mbnmbn00 commented 1 year ago

consensi.fa.classified is repeat library in FASTA generated by RepeatModeler. So what you want to put for consensi.fa.classified is a FASTA format.