TobyBaril / EarlGrey

Earl Grey: A fully automated TE curation and annotation pipeline
Other
130 stars 19 forks source link

Warning: malformed fasta index file #134

Open JonEilers opened 2 weeks ago

JonEilers commented 2 weeks ago

Hi, looks like I am having a similar problem to #121

installed earlgrey via conda

  earlgrey                              4.4.4         h4ac6f70_0              bioconda   

I've uploaded the fasta index file here

<<< Generating Summary Plots >>>
 [1] "/home/jon/micromamba/envs/earlgrey/lib/R/bin/exec/R"                                                                                                                   
 [2] "--no-echo"                                                                                                                                                             
 [3] "--no-restore"                                                                                                                                                          
 [4] "--file=/home/jon/micromamba/envs/earlgrey/share/earlgrey-4.4.4-0/scripts//autoPie.R"                                                                                   
 [5] "--args"                                                                                                                                                                
 [6] "/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_mergedRepeats/looseMerge/pyrus_communis_bartlett.filteredRepeats.bed"
 [7] "/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_mergedRepeats/looseMerge/pyrus_communis_bartlett.filteredRepeats.gff"
 [8] "498038199"                                                                                                                                                             
 [9] "/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_summaryFiles/pyrus_communis_bartlett.summaryPie.pdf"                 
[10] "/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_summaryFiles/pyrus_communis_bartlett.highLevelCount.txt"             
Splitting repeat library
Reading in gff
Starting calculations
WARNING. chromosome (Chr7_pilon) was not found in the FASTA file. Skipping.
WARNING. chromosome (Chr4_pilon) was not found in the FASTA file. Skipping.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/jon/micromamba/envs/earlgrey/share/earlgrey-4.4.4-0/scripts//divergenceCalc/divergence_calc.py", line 117, in outer_func
    a = a.sequence(fi=genome_path, fo=query_path, s=True)
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/site-packages/pybedtools/bedtool.py", line 907, in decorated
    result = method(self, *args, **kwargs)
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/site-packages/pybedtools/bedtool.py", line 388, in wrapped
    stream = call_bedtools(
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError: 
Command was:

    bedtools getfasta -s -fo tmp//qseqs/122430 -fi /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta -bed /home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_RepeatLandscape/tmp/pybedtools/pybedtools.ot9xcfnu.tmp

Error message was:
[E::fai_read] Could not understand FASTA index /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta.fai line 305
[E::fai_load3_core] Failed to read FASTA index /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta.fai
Warning: malformed fasta index file /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jon/micromamba/envs/earlgrey/share/earlgrey-4.4.4-0/scripts//divergenceCalc/divergence_calc.py", line 216, in <module>
    results = pool.map(func, chunks)
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/jon/micromamba/envs/earlgrey/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
pybedtools.helpers.BEDToolsError: 
Command was:

    bedtools getfasta -s -fo tmp//qseqs/122430 -fi /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta -bed /home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_RepeatLandscape/tmp/pybedtools/pybedtools.ot9xcfnu.tmp

Error message was:
[E::fai_read] Could not understand FASTA index /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta.fai line 305
[E::fai_load3_core] Failed to read FASTA index /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta.fai
Warning: malformed fasta index file /home/jon/Desktop/USDA_bartlett/PyrusCommunis_BartlettDHv2.0.pilon.fasta

Error in open.connection(con, open) : cannot open the connection
Calls: read_gff ... connection -> connectionForResource -> open -> open.connection
In addition: Warning message:
In open.connection(con, open) :
  cannot open file '/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_RepeatLandscape/pyrus_communis_bartlett.filteredRepeats.withDivergence.gff': No such file or directory
Execution halted
cp: cannot stat '/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_RepeatLandscape/*.pdf': No such file or directory
cp: cannot stat '/home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_RepeatLandscape/*_summary_table.tsv': No such file or directory

              )  (
         (   ) )
         ) ( (
       _______)_
    .-'---------|  
       ( C|/\/\/\/\/|
    '-./\/\/\/\/|
     '_________'
      '-------'
    <<< Tidying Directories and Organising Important Files >>>

              )  (
         (   ) )
         ) ( (
       _______)_
    .-'---------|  
       ( C|/\/\/\/\/|
    '-./\/\/\/\/|
     '_________'
      '-------'
    <<< Generating Softmasked Genome >>>

              )  (
         (   ) )
         ) ( (
       _______)_
    .-'---------|  
       ( C|/\/\/\/\/|
    '-./\/\/\/\/|
     '_________'
      '-------'
    <<< Done in 37:14:03.00 >>>

              )  (
         (   ) )
         ) ( (
       _______)_
    .-'---------|  
       ( C|/\/\/\/\/|
    '-./\/\/\/\/|
     '_________'
      '-------'
    <<< TE library, Summary Figures, and TE Quantifications in Standard Formats Can Be Found in /home/jon/Desktop/USDA_bartlett/earlgrey/pyrus_communis_bartlett_EarlGrey/pyrus_communis_bartlett_summaryFiles/ >>>
TobyBaril commented 2 weeks ago

Hi, thanks for checking out Earl Grey! Hmm, the fasta index looks okay and there aren't any strange invisible characters on that line. @jamesdgalbraith any idea whether this is linked to weird tmp issues with pybedtools?

jamesdgalbraith commented 3 days ago

Hi, thanks for reporting this issue. I'm not entirely sure what's the underlying issue is, but based on similar issues other packages are having I think that when pybedtools calls BEDtools it's having occasionally having problems in parsing the genome's fai index when two processes are accessing it at the same time. What I'm failing to understand is why that is producing the error reported here (E::fai_load3_core), as that's a SAMtools error! I'll see if I can figure a way to avoid this.