dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

Step 7 doesn't automatically skip samples without data #388

Closed edgardomortiz closed 4 years ago

edgardomortiz commented 4 years ago

I got the following error:

-------------------------------------------------------------
  ipyrad [v.0.9.31]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  Parallel connection | PlantBiodiversitys-Mac-Pro.local: 24 cores

  Step 1: Loading sorted fastq data to Samples
  [####################] 100% 0:00:17 | loading reads
  70 fastq files loaded to 70 Samples.

  Step 2: Filtering and trimming reads
  [####################] 100% 0:07:34 | processing reads

  Step 3: Clustering/Mapping reads within samples
  [####################] 100% 0:04:54 | dereplicating
  [####################] 100% 7:26:33 | clustering/mapping
  [####################] 100% 0:00:20 | building clusters
  [####################] 100% 0:00:06 | chunking clusters
  [####################] 100% 1:50:34 | aligning clusters
  [####################] 100% 0:01:48 | concat clusters
  [####################] 100% 0:00:24 | calc cluster stats

  Step 4: Joint estimation of error rate and heterozygosity
  skipping <ipyrad.Sample object Pe_194>; no clusters found.
  [####################] 100% 0:00:56 | inferring [H, E]

            These samples failed joint estimation and will be excluded from
            downstream analysis (probably very few highdepth reads):
            ['Pe_194']

  Step 5: Consensus base/allele calling
skipping samples not in state==4:
['Pe_194']
  Mean error  [0.00378 sd=0.00278]
  Mean hetero [0.00833 sd=0.00601]
  [####################] 100% 0:00:23 | calculating depths
  [####################] 100% 0:00:49 | chunking clusters
  [####################] 100% 0:07:16 | consens calling
  [####################] 100% 0:00:09 | indexing alleles

  Step 6: Clustering/Mapping across samples
skipping samples not in state==5:
['Pe_194']
  [####################] 100% 0:00:09 | concatenating inputs
  [####################] 100% 0:03:10 | clustering tier 1
  [####################] 100% 0:00:01 | concatenating inputs
  [####################] 100% 0:03:15 | clustering across
  [####################] 100% 0:00:03 | building clusters
  [####################] 100% 0:00:36 | aligning clusters

  Step 7: Filtering and formatting output files

  Encountered an Error.
  Message:
There are samples in this assembly that were not present in step 6. This is
likely due to branching or merging. The following samples are not in the step6
database:
{'Pe_194'}

So I had to branch and drop the sample, then it worked. I found an error in the instructions for branching https://ipyrad.readthedocs.io/en/latest/8-branching.html#drop-samples-by-branching , where the command says:

## branch and only keep 3 samples from assembly data1
>>> ipyrad -n data1 -b data2 1A0 1B0 1C0

## and/or, branch and only exclude 3 samples from assembly data1
>>> ipyrad -n data1 -b data3 - 1A0 1B0 1C0

What actually worked for branching was:

## branch and only keep 3 samples from assembly data1
>>> ipyrad -p params-data1.txt -b data2 1A0 1B0 1C0

## and/or, branch and only exclude 3 samples from assembly data1
>>> ipyrad -p params-data1.txt -b data3 - 1A0 1B0 1C0
isaacovercast commented 4 years ago

Thanks Edgardo. I fixed the documentation.

As for automatically skipping bad samples in step 7, it's tricky. At the moment there's no way to distinguish between a sample that failed an earlier step, and a sample in a merged assembly that simply hasn't had that step run yet. I updated the error message to try to provide a little more advice on how to proceed. I agree this is somewhat annoying, but I don't think we can change this behavior.