TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
129 stars 19 forks source link

[E::fai_read] Could not understand FASTA index and RepeatLandscape #121

Closed enriquepola1996 closed 1 month ago

enriquepola1996 commented 1 month ago

Hello, thank you very much for this nice tool, I am running a fungal genome for the first time but it seems that there was a problem at the end, "Could not understand FASTA index", how could I solve this? At the exit, only a pdf returned to me.

I will greatly appreciate the help.

My installation:

conda create -n earlgrey -c conda-forge -c bioconda earlgrey=4.2.4

My commands:

earlGrey -g fungal_T9_kmer.fasta -s fungal_T9_kmer -t 8 -r Fungi -d yes -o ./earlGreyOutputs

My output fyles:

(earlgrey) enrique@DESKTOP-62E8R2K:~/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_summaryFiles$ ls -l
total 41444
-rw-r--r-- 1 enrique enrique 96364 Jul 9 05:16 fungal _T9_kmer-families.fa.strained
-rw-r--r-- 1 enrique enrique 78467 Jul 9 05:13 fungal _T9_kmer.familyLevelCount.txt
-rw-r--r-- 1 enrique enrique 493491 Jul 9 05:16 fungal _T9_kmer.filteredRepeats.bed
-rw-r--r-- 1 enrique enrique 1110064 Jul 9 05:16 fungal _T9_kmer.filteredRepeats.gff
-rw-r--r-- 1 enrique enrique 350 Jul 9 05:13 fungal _T9_kmer.highLevelCount.txt
-rw-r--r-- 1 enrique enrique 40519788 Jul 9 05:16 fungal _T9_kmer.softmasked.fasta
-rw-r--r-- 1 enrique enrique 7203 Jul 9 05:13 fungal _T9_kmer.summaryPie.pdf
-rw-r--r-- 1 enrique enrique 112749 Jul 9 05:16 fungal _T9_kmer_combined_library.fasta

My log file:


<<< Generating Summary Plots >>>
 [1] "/home/enrique/miniconda3/envs/earlgrey/lib/R/bin/exec/R"                                                                                                 
 [2] "--no-echo"                                                                                                                                                   
 [3] "--no-restore"                                                                                                                                                
 [4] "--file=/home/enrique/miniconda3/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//autoPie.R"                                                                 
 [5] "--args"                                                                                                                                                      
 [6] "/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_mergedRepeats/looseMerge/fungal _T9_kmer.filteredRepeats.bed"
 [7] "/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_mergedRepeats/looseMerge/fungal _T9_kmer.filteredRepeats.gff"
 [8] "39853892"                                                                                                                                                    
 [9] "/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_summaryFiles/fungal _T9_kmer.summaryPie.pdf"                 
[10] "/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_summaryFiles/fungal _T9_kmer.highLevelCount.txt"             
Splitting repeat library
Reading in gff
Starting calculations
WARNING. chromosome (contig_8) was not found in the FASTA file. Skipping.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/enrique/miniconda3/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//divergenceCalc/divergence_calc.py", line 116, in outer_func
    a = a.sequence(fi=genome_path, fo=query_path, s=True)
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/site-packages/pybedtools/bedtool.py", line 907, in decorated
    result = method(self, *args, **kwargs)
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/site-packages/pybedtools/bedtool.py", line 388, in wrapped
    stream = call_bedtools(
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError: 
Command was:

    bedtools getfasta -s -fo tmp//qseqs/808 -fi /home/enrique/earlgrey/fungal _T9_kmer.fasta -bed /tmp/pybedtools.vjri15g4.tmp

Error message was:
[E::fai_read] Could not understand FASTA index /home/enrique/earlgrey/fungal _T9_kmer.fasta.fai line 131
[E::fai_load3_core] Failed to read FASTA index /home/enrique/earlgrey/fungal _T9_kmer.fasta.fai
Warning: malformed fasta index file /home/enrique/earlgrey/fungal _T9_kmer.fasta

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/enrique/miniconda3/envs/earlgrey/share/earlgrey-4.2.4-1/scripts//divergenceCalc/divergence_calc.py", line 203, in <module>
    results = pool.map(func, chunks)
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/enrique/miniconda3/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
pybedtools.helpers.BEDToolsError: 
Command was:

    bedtools getfasta -s -fo tmp//qseqs/808 -fi /home/enrique/earlgrey/fungal _T9_kmer.fasta -bed /tmp/pybedtools.vjri15g4.tmp

Error message was:
[E::fai_read] Could not understand FASTA index /home/enrique/earlgrey/fungal _T9_kmer.fasta.fai line 131
[E::fai_load3_core] Failed to read FASTA index /home/enrique/earlgrey/fungal _T9_kmer.fasta.fai
Warning: malformed fasta index file /home/enrique/earlgrey/fungal _T9_kmer.fasta

Error in open.connection(con, open) : cannot open the connection
Calls: %>% ... connection -> connectionForResource -> open -> open.connection
In addition: Warning message:
In open.connection(con, open) :
  cannot open file '/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_RepeatLandscape/fungal _T9_kmer.filteredRepeats.withDivergence.gff': No such file or directory
Execution halted
cp: cannot stat '/home/enrique/earlgrey/earlGreyOutputs/fungal _T9_kmer_EarlGrey/fungal _T9_kmer_RepeatLandscape/*.pdf': No such file or directory
TobyBaril commented 1 month ago

Hi, it looks like there might have been a parsing error with the fasta, as there is a space in the filenames between fungal and _T9. I think this is within the divergence_calc.py script. @jamesdgalbraith might have a better insight into this. We are also about to push some updates that I am testing at the moment which might solve this...

enriquepola1996 commented 1 month ago

Hello, I repeated the run and this time it finished and generated the graphs, I'm just worried that at the end of the process warnings and an error appeared:

log file:

 [9] "/home/enriquepola/earlgrey_ascomycota/earlGreyOutputs_Ascomycota/fungal_T9_kmer_EarlGrey/fungal_T9_kmer_summaryFiles/fungal_T9_kmer.summaryPie.pdf"
[10] "/home/enriquepola/earlgrey_ascomycota/earlGreyOutputs_Ascomycota/fungal_T9_kmer_EarlGrey/fungal_T9_kmer_summaryFiles/fungal_T9_kmer.highLevelCount.txt"
Splitting repeat library
Reading in gff
Starting calculations
Finished calculations
Total run time for  11605  rows was  333.69530034065247  seconds
Warning message:
Removed 2 rows containing missing values or values outside the scale range
(`geom_col()`).
Warning message:
Removed 2 rows containing missing values or values outside the scale range
(`geom_col()`).
Error in combine_vars(data, params$plot_env, rows, drop = params$drop) :
  At least one layer must contain all faceting variables: `subclass`
✖ Plot is missing `subclass`
✖ Layer 1 is missing `subclass`
Warning messages:
1: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
2: Graphs cannot be vertically aligned unless the axis parameter is set. Placing graphs unaligned.

........
<<< TE library, Summary Figures, and TE Quantifications in Standard Formats Can Be Found in /home/enriquepola/earlgrey_ascomycota/earlGreyOutputs_Ascomycota/fungal_T9_kmer_EarlGrey/fungal_T9_kmer_summaryFiles/ >>>

The process ended with these results:

(base) enriquepola@DESKTOP-62E8R2K:~/earlgrey_ascomycota/earlGreyOutputs_Ascomycota/fungal_T9_kmer_EarlGrey/fungal_T9_kmer_summaryFiles$ ls
fungal_T9_kmer-families.fa.strained  fungal_T9_kmer.summaryPie.pdf
fungal_T9_kmer.familyLevelCount.txt  fungal_T9_kmer_classification_landscape.pdf
fungal_T9_kmer.filteredRepeats.bed   fungal_T9_kmer_combined_library.fasta
fungal_T9_kmer.filteredRepeats.gff   fungal_T9_kmer_split_class_landscape.pdf
fungal_T9_kmer.highLevelCount.txt    fungal_T9_kmer_superfamily_div_plot.pdf
fungal_T9_kmer.softmasked.fasta
TobyBaril commented 1 month ago

Hi, this is okay. The R warnings are because the Rscript has to be generic enough to account for all TE families, some of which will not be found in your species of interest. I haven't silenced the warnings as they are sometimes useful for debugging, but you have the expected files!

TobyBaril commented 1 month ago

I can double-check for you to be sure - could you attachfungal_T9_kmer.filteredRepeats.gff and fungal_T9_kmer.highLevelCount.txt here if that is okay? I can double check you get the required results

enriquepola1996 commented 1 month ago

Yes @TobyBaril . fungal_T9_kmer.filteredRepeats.gff.txt fungal_T9_kmer.highLevelCount.txt

In the end my command was:

earlGrey -g fungal_T9_kmer.fasta -s fungal_T9_kmer -t 8 -r Ascomycota -d yes -o ./earlGreyOutputs_Ascomycota

Thank so much for your support.

TobyBaril commented 1 month ago

Hi @enriquepola1996, I've had a look at the outputs and the summaries look correct. Feel free to reach out if you have any other questions!

enriquepola1996 commented 1 month ago

Thanks so much.