clemgoub / dnaPipeTE

dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (< 1X).
51 stars 11 forks source link

FileNotFoundError: [Errno 2] No such file or directory: '/dnapipete_3/Trinity.fasta.out' #50

Closed mkim0327 closed 3 years ago

mkim0327 commented 3 years ago

Hello!

I am trying to run dnaPipeTE with publicly available WGS data.

I will greatly appreciate it if you can help me with the following error message:

parseTagData: ID field not to EMBL spec "SNAP-OL2 repeatmasker; DNA; ???; BP. " from DE RepbaseID: SNAP-OL2XX

at /home/Softwares/dnaPipeTE/bin/RepeatMasker/RepeatMasker line 7611. Traceback (most recent call last): File "./dnaPipeTE.py", line 698, in RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold) File "./dnaPipeTE.py", line 381, in init self.repeatmasker_run() File "./dnaPipeTE.py", line 400, in repeatmasker_run with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/Data/dnapipete_3/Trinity.fasta.out'

clemgoub commented 3 years ago

Hi there!

It seems that the program can't find the output of RepeatMasker. There could be several reasons, but primarily I would first check if Trinity has run normally by looking for the file "Trinity.fasta" in the output. If the file is present and contains fasta sequences, this is working. The next step would be to check if RepeatMasker is installed properly.

Can you send me the full log (stdout and stderr) as well as the complete command line you use to run dnaPipeTE? I will be able to pinpoint the source of error!

Thanks for using dnaPipeTE!

Cheers,

Clément

Christian-Ramos-Uria commented 3 years ago

Hi!

I got the same error while trying to run the analysis on the test dataset. Trinity seems to be working fine, as I get a Trinity.fasta file. I think the problem arises here:

"Species "All" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script."

I attached the full log for reference. log.txt

My command line is: python3 ./dnaPipeTE.py -input ./test/test_dataset.fastq -output ~/Thesis/dnaPipeTE/test -genome_size 2000000 -genome_coverage 0.5 -sample_number 2

Cheers, Christian

clemgoub commented 3 years ago

Hello,

Yes it is probably due to the fact that this mode was working previously if you had the Repbase libraries, and it looks like your version of RepeatMasker is more recent (4.1.x) than the one I use with my dnaPipeTE install (4.0.x).

Right now I would recommend to use an external TE library with the option -RM_lib <file.fasta>. If you have older version of RepeatMasker you could try to locate the file called "specieslib" buried somewhere in the RepeatMasker/Libraries/ subfolders. It used to be the fasta file generated from Repbase at the time there was no subscription. Please DM me if you need help with that.

Both your comments makes me think that I need to do something about it, and I will try to fix this issue soon.

Best,

Clément

Christian-Ramos-Uria commented 3 years ago

Hi,

It turns out that RepeatMasker (4.1.x) will build the libraries if the -species flag is specified. For the test dataset, I added "-species diptera", and it worked!

Cheers,

Christian

clemgoub commented 3 years ago

Excellent! Thank you for sharing the tip =)

Cheers,

Clément

Y-Vdb commented 6 days ago

Hi Clément,

Unfortunately, I've run into the same error. The reason I request your help is that I'm already using an external TE library. Please find below my prompt, as well as the STD_OUTPUT here and STD_ERROR here: python3 /users/yvan/dnaPipeTE.py \ -input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output_dstevensoni/ \ -cpu 8 \ -sample_number 2 \ -genome_size 455000000 \ -genome_coverage 0.15 \ -RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \ -RM_t 0.2 \ -contig_length 200 \ 1>/users/yvan/output.txt 2>/users/yvan/error.txt

Thank you kindly in advance for your help! Looking forward to obtaining results with your wonderful tool.

clemgoub commented 6 days ago

Hello Yelle,

Thanks for your kind words! First I would like to ask you if you are using the docker/singularity version of dnaPipeTE. I see in your error files that the version say "container", but from your command line I have the impression that you are actually not using the container for the deps. If that's the case, I encourage you to use it as described here: https://tehub.org/tutorials/docs/dnaPipeTE

If you are already doing this, let me know, and I'll dig deeper!

Cheers,

Clément