dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

IndexError: Merging after step 2 #502

Closed isaacovercast closed 1 year ago

isaacovercast commented 1 year ago

@amesclir Opening a new issue because this does seem to be a different problem, unrelated to disk availability.

This is a monster, a bit of an edge case. It only happens with merged data, only happens for single-end (because PE data will be paired-end merged creating a tmp file ipyrad can find), and it only happens for samples that are not replicated across both merged assemblies (i discovered this because I tried to merge 2 dummy simulated datasets and it didn't produce the error, but this is because if you have technical replicates the get concatenated, which again creates a file that step 3 can find).

Two plates run through step 2 and then merged before running step 3:

 -------------------------------------------------------------
  ipyrad [v.0.9.88]
  Interactive assembly and analysis of RAD-seq data
 ------------------------------------------------------------- 
  Parallel connection | nehalem12: 1 cores

  Step 3: Clustering/Mapping reads within samples
  [####################] 100% 0:00:03 | indexing reference     
  [####################] 100% 0:00:46 | concatenating          
  [####################] 100% 0:01:22 | dereplicating          

  Encountered an Error.
  Message: IndexError: list index out of range
  Parallel connection closed.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File <string>:1

File /home/software/anaconda3/envs/ipyrad-0.9.88/lib/python3.10/site-packages/ipyrad/assemble/clustmap.py:741, in dereplicate(data, sample, nthreads)
    726 infiles = [
    727     os.path.join(
    728         data.dirs.edits,
   (...)
    738         "{}_declone.fastq".format(sample.name)),
    739 ]
    740 infiles = [i for i in infiles if os.path.exists(i)]
--> 741 infile = infiles[-1]
    743 # datatypes options
    744 strand = "plus"

IndexError: list index out of range
isaacovercast commented 1 year ago

Fixed: 49ca401b

isaacovercast commented 1 year ago

I added a try/except to catch the IndexError and then pull the fastq per sample from the sample.files.edits attribute.