dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

Step 4 fails with low-coverage samples #281

Closed TomaszSuchan closed 4 years ago

TomaszSuchan commented 7 years ago

Hi! It seems that step 4 fails when there are samples with too few depth reads (eg. blank samples). After removing them it all runs well.

$ ipyrad -p params-radz.txt -s1234567 -f

 -------------------------------------------------------------
  ipyrad [v.0.7.17]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  New Assembly: radz
  establishing parallel connection:
  host compute node: [24 cores] on Trollius

  Step 1: Loading sorted fastq data to Samples
  [####################] 100%  loading reads         | 0:00:09
  84 fastq files loaded to 42 Samples.

  Step 2: Filtering reads
  [####################] 100%  processing reads      | 0:07:24

  Step 3: Clustering/Mapping reads
  [####################] 100%  dereplicating         | 0:01:49
  [####################] 100%  clustering            | 0:36:25
  [####################] 100%  building clusters     | 0:01:04
  [####################] 100%  chunking              | 0:00:11
  [####################] 100%  aligning              | 1:05:26
  [####################] 100%  concatenating         | 0:00:32

  Step 4: Joint estimation of error rate and heterozygosity
    skipping B1i6. Too few high depth reads (1).
    skipping B1i12. Too few high depth reads (1).
  [####################] 100%  inferring [H, E]      | 0:03:46
ERROR:ipyrad.core.assembly:The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
dereneaton commented 6 years ago

Hi Tomasz,

Thanks for reporting this. I'm guessing the problem is caused by those two samples which appear to have failed (they recover no high depth clusters). Ipyrad should skip over those samples but it seems to be causing a problem, which is a bug. I'll take a look at it.

For now, to work around it, what you will want to do is to create a new branch that excludes the samples with very little data. You can do that by following these instructions: http://ipyrad.readthedocs.io/tutorial_advanced_cli.html#branching-and-selecting-a-subset-of-samples-by-sample-name

Or, briefly, you type the following, where the "-" symbol after the new assembly name means to drop the samples that come after that:

ipyrad -p params-radz.txt -b newassembly - B1i6 B1i12

Then continue from step 4 with your new assembly

ipyrad -p params-newassembly.txt -s 4567 -f

Cheers,

On Thu, Nov 9, 2017 at 1:27 PM, Tomasz Suchan notifications@github.com wrote:

Hi! It seems that step 4 fails when there are samples with too few depth reads (eg. blank samples). After removing them it all runs well.

$ ipyrad -p params-radz.txt -s1234567 -f


ipyrad [v.0.7.17] Interactive assembly and analysis of RAD-seq data

New Assembly: radz establishing parallel connection: host compute node: [24 cores] on Trollius

Step 1: Loading sorted fastq data to Samples [####################] 100% loading reads | 0:00:09 84 fastq files loaded to 42 Samples.

Step 2: Filtering reads [####################] 100% processing reads | 0:07:24

Step 3: Clustering/Mapping reads [####################] 100% dereplicating | 0:01:49 [####################] 100% clustering | 0:36:25 [####################] 100% building clusters | 0:01:04 [####################] 100% chunking | 0:00:11 [####################] 100% aligning | 1:05:26 [####################] 100% concatenating | 0:00:32

Step 4: Joint estimation of error rate and heterozygosity skipping B1i6. Too few high depth reads (1). skipping B1i12. Too few high depth reads (1). [####################] 100% inferring [H, E] | 0:03:46 ERROR:ipyrad.core.assembly:The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Encountered an unexpected error (see ./ipyrad_log.txt) Error message is below ------------------------------- The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/281, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJUGC8SXIyMDNjzVba77oWoMqsF-FElks5s00P0gaJpZM4QYb2C .

-- Deren A.R. Eaton Associate Research Scientist Yale University Assistant Professor (starting 2017) Columbia University Department of Ecology, Evolution, and Environmental Biology

TomaszSuchan commented 6 years ago

Hi Deren, yes the problem was caused by these two 'blank' samples I was using to controll for the contamination. It would be great if pyRAD could automatically skip them. For now I was just removing them from the dataset as you suggested.

Cheers, Tomasz

tahamimo commented 5 years ago

Hi guys, I have the same problem with ipyrad and I don't have blank samples, only some of them have lower reads but not empty, could it be happening for other reasons?

tahamimo commented 5 years ago

another funny bug, I have submitted several batch jobs each containing a different params, now every time i am getting an error from each one of them separately, saying that, blablabla_derep.fastq doesn't exit!! each one of them reports a different file name as a non-exitance!!! how is it possible when all of those files actually do exists in the same directory?!!

isaacovercast commented 5 years ago

Are they lower reads, but still retaining some high depth clusters? If not then this problem would probably remain.

It's best not to mix different problems in github issues. The second problem could be related to a lot of different things since you need to provide much more information about your environment, the params you're using, etc etc. If you are running on a cluster make sure the paths on the compute nodes are the same as on the head node (often not the case).

isaacovercast commented 5 years ago

Your second question is probably a better fit for the ipyrad gitter channel, since i suspect it is not an ipyrad issue but an environment issue:

https://gitter.im/dereneaton/ipyrad

On Thu, Aug 8, 2019 at 10:09 AM tahamimo notifications@github.com wrote:

another funny bug, I have submitted several batch jobs each containing a different params, now every time i am getting an error from each one of them separately, saying that, blablabla_derep.fastq doesn't exit!! each one of them reports a different file name as a non-exitance!!! how is it possible when all of those files actually do exists in the same directory?!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/281?email_source=notifications&email_token=ABNSXP7ZZLT3VHBZCABILYTQDQSIXA5CNFSM4EDBXWBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33XMWQ#issuecomment-519534170, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNSXP6FRHYI7BIWQHYPIEDQDQSIXANCNFSM4EDBXWBA .

tahamimo commented 5 years ago

Your second question is probably a better fit for the ipyrad gitter channel, since i suspect it is not an ipyrad issue but an environment issue: https://gitter.im/dereneaton/ipyrad On Thu, Aug 8, 2019 at 10:09 AM tahamimo @.***> wrote: another funny bug, I have submitted several batch jobs each containing a different params, now every time i am getting an error from each one of them separately, saying that, blablabla_derep.fastq doesn't exit!! each one of them reports a different file name as a non-exitance!!! how is it possible when all of those files actually do exists in the same directory?!! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#281?email_source=notifications&email_token=ABNSXP7ZZLT3VHBZCABILYTQDQSIXA5CNFSM4EDBXWBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33XMWQ#issuecomment-519534170>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNSXP6FRHYI7BIWQHYPIEDQDQSIXANCNFSM4EDBXWBA .

Thanks Isaac, sorry for the unrelated question, surprisingly both problems were solved without my intervention! I just re-submitted the jobs separately, I think it is best to no submit different branched assemblies in different batch files in parallel..

isaacovercast commented 5 years ago

It is 100% better to not submit different branches of the same assembly at the same time, this will often actually result in assembly failure as you're seeing.

tahamimo commented 5 years ago

Hi again Isaac! I encountered the same problem again and again! there are different number of samples listed as "Too few high depth reads (0.00)" at each run, so ipyrad suddenly stops running as stage 4 with the following error: Encountered an unexpected error (see ./ipyrad_log.txt) Error message is below ------------------------------- The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() I have tried to run it from scratch severa times, yet getting the same error.

p.s. I have already removed low coverage reads using denovo assembly, now running with denovo-reference assembly while getting such error, and I cannot remove these low depth reads because they are almost half of my data set!

p.ss. I am not submitting several ipyrad batch jobs in parallel!

isaacovercast commented 5 years ago

What version of ipyrad are you running?

tahamimo commented 5 years ago

0.7.28, it is more stable I tried 0.7.29, but that version is even worse!

isaacovercast commented 5 years ago

Hm, well It's best if you always use the most recent version, which for now is 0.7.30. Can you update to the newest version and try again? It's possible the problem is already fixed.

isaacovercast commented 4 years ago

I believe this issue is fixed in the current version (v.0.9.34), please reopen this issue if this is still a problem.