dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

Step 7: ambiguities in reference sequence cause KeyError: ## #402

Closed laninsky closed 4 years ago

laninsky commented 4 years ago

@Pointyhead documented this in the gitter back in November 2019 with a KeyError: 66, but I just got tripped up by it again (KeyError: 86), so I thought I'd make and close an issue just so it is documented in the issues for the next person who runs into it.

Getting a KeyError: some number during step 7 immediately following the build arrays step is likely due to ambiguity sites in your reference sequence.

You can solve this by using maskambignuc from the emboss package to replace the ambiguous sites from the reference with Ns before running ipyrad.

Here's what the error looks like. Calling ipyrad:

ipyrad -p params-chickadee_ref.txt -s 7 -c 36 -f -d

Output:

loading Assembly: chickadee_ref
  from saved path: /scale_wlg_nobackup/filesets/nobackup/uoo00105/chickadees/chickadee_ref.json

 -------------------------------------------------------------
  ipyrad [v.0.9.51]
  Interactive assembly and analysis of RAD-seq data
 ------------------------------------------------------------- 
  Parallel connection | wbn201: 36 cores

  Step 7: Filtering and formatting output files 
  [####################] 100% 0:00:22 | applying filters       
  [####################] 100% 0:01:18 | building arrays        

  Encountered an Error.
  Message: KeyError: 86

  Parallel connection closed.
---------------------------------------------------------------------------KeyError                                  Traceback (most recent call last)<string> in <module>
/nesi/nobackup/uoo00105/chickadees/bin/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/write_outputs.py in fill_snp_array(data, ntaxa, nsnps)
   2147         # fill for each taxon
   2148         for sidx in range(ntaxa):
-> 2149             resos = [DCONS[i] for i in snparr[sidx, :]]
   2150 
   2151             # pseudoref version
/nesi/nobackup/uoo00105/chickadees/bin/miniconda3/lib/python3.7/site-packages/ipyrad/assemble/write_outputs.py in <listcomp>(.0)
   2147         # fill for each taxon
   2148         for sidx in range(ntaxa):
-> 2149             resos = [DCONS[i] for i in snparr[sidx, :]]
   2150 
   2151             # pseudoref version
KeyError: 86

But after converting the ambiguous sites in the reference to Ns:

  Step 7: Filtering and formatting output files 
  [####################] 100% 0:00:15 | applying filters       
  [####################] 100% 0:04:00 | building arrays        
  [####################] 100% 0:01:34 | writing conversions    
  [####################] 100% 0:02:03 | indexing vcf depths    
  [####################] 100% 0:11:04 | writing vcf output     

  Parallel connection closed.
isaacovercast commented 4 years ago

This is probably the most useful and best documented issue submitted by a non-developer, and the single most selfless gesture by any ipyrad user ever. Thank you for taking the time to contribute. -isaac

On Thu, May 7, 2020 at 12:48 PM Alana Alexander notifications@github.com wrote:

Closed #402 https://github.com/dereneaton/ipyrad/issues/402.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/402#event-3312315042, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXP6SRTNYS32ONTGO4G3RQKGZDANCNFSM4M3G2DSQ .

laninsky commented 4 years ago

No worries Isaac! Least I can do given you guys make such an amazing package freely available to us!