dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Step 1: "Error tokenizing data" #474

Closed c-mccarron closed 2 years ago

c-mccarron commented 2 years ago

Hello, I have been trying to run the first step of ipyrad and keep running into problems. I cannot tell if this is from my params file, my barcode file or if this is something from else going on. In looking into the problem this generally happens with .csv files but I am not using any. The error message is as such: Traceback (most recent call last): File "/home/myfloradna/miniconda3/bin/ipyrad", line 10, in sys.exit(main()) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/main.py", line 605, in main CLI() File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/main.py", line 69, in init self.get_assembly() File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/main.py", line 368, in get_assembly data.set_params(key, param) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/core/assembly.py", line 493, in set_params setattr(self.params, param, newvalue) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/core/params.py", line 267, in setattr super().setattr(key, val) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/core/params.py", line 329, in barcodes_path self._data._link_barcodes() File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/ipyrad/core/assembly.py", line 266, in _link_barcodes bdf = pd.read_csv(barcodefile[0], header=None, delim_whitespace=1) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv return _read(filepath_or_buffer, kwds) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read return parser.read(nrows) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1250, in read index, columns, col_dict = self._engine.read(nrows) File "/home/myfloradna/miniconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 225, in read chunks = self._reader.read_low_memory(nrows) File "pandas/_libs/parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 188, saw 5

It seems like this is likely some problem with miniconda installation or some larger problem with the computer. How do I go about addressing this issue?

isaacovercast commented 2 years ago

Looks likee a problem with the barcodes file. Can you post your barcodes file here?

c-mccarron commented 2 years ago

1.1 GCATG TCGCAGG 1.2 GCATG CTCTGCA 1.4 GCATG CCTAGGT 1.8 GCATG GGATCAA 1.9 GCATG GCAAGAT 2.1 GCATG ATGGAGA 2.2 GCATG CTCGATG 2.3 GCATG GCTCGAA 2.4 GGTTG TCGCAGG 2.5 GGTTG CTCTGCA 2.6 GGTTG CCTAGGT 2.7 GGTTG GGATCAA 2.8 GGTTG GCAAGAT 2.9 GGTTG ATGGAGA 2.1 GGTTG CTCGATG 3.1 GGTTG GCTCGAA 3.2 GTAGT TCGCAGG 3.3 GTAGT CTCTGCA 3.4 GTAGT CCTAGGT 3.6 GTAGT GGATCAA 3.7 GTAGT GCAAGAT 3.8 GTAGT ATGGAGA 3.9 GTAGT CTCGATG 3.1 GTAGT GCTCGAA 4.4 AACCA TCGCAGG 4.6 AACCA CTCTGCA 4.9 AACCA CCTAGGT 5.2 AACCA GGATCAA 5.3 AACCA GCAAGAT 5.5 AACCA ATGGAGA 5.7 AACCA CTCGATG 5.8 AACCA GCTCGAA 5.9 AAGGA TCGCAGG 5.1 AAGGA CTCTGCA 6.1 AAGGA CCTAGGT 6.2 AAGGA GGATCAA 6.4 AAGGA GCAAGAT 6.5 AAGGA ATGGAGA 6.6 AAGGA CTCGATG 6.8 AAGGA GCTCGAA 6.9 AGCTA TCGCAGG 6.1 AGCTA CTCTGCA 7.1 AGCTA CCTAGGT 7.2 AGCTA GGATCAA 7.3 AGCTA GCAAGAT 7.4 AGCTA ATGGAGA 7.5 AGCTA CTCGATG 7.6 AGCTA GCTCGAA 7.7 TGCAT TCGCAGG 7.8 TGCAT CTCTGCA 7.9 TGCAT CCTAGGT 7.1 TGCAT GGATCAA 8.1 TGCAT GCAAGAT 8.2 TGCAT ATGGAGA

isaacovercast commented 2 years ago

Ah, I bet the sample names are being interpreted as floating point numbers instead of strings. Are you committed to these sample names, are they meaningful w/in your metadata somehow? If you add a letter to the sample names, or use a dash instead of a period, or an underscore, then itll work fine, the file reader is just seeing this column of 1.3 formatted values and assuming it's a number not a string. I can fix it in the code, but it might take time for the new version to propagate up to bioconda.

Also, you have 2.1 in there twice, is that intentionally? These will be treated as technical replicates and the reads will be pooled into one sample.

c-mccarron commented 2 years ago

Thanks, I can go through and change the names on these. I will let you know if there is any problems. I am not sure how 2.1 got in there twice but its not intentional. I think it might have been an error in the copy/paste between two documents.

isaacovercast commented 2 years ago

wait, was that your whole barcodes file? I think what I said earlier might not be the problem. Can you look at line 188 in your barcodes file? This is where the problem is

c-mccarron commented 2 years ago

I switched around the name of line 188 and it finally worked. Thank you for your advice. Now on Step 3, it is giving me this error message though: Step 3: Clustering/Mapping reads within samples [####################] 100% 0:00:00 | indexing reference
[####################] 100% 0:00:01 | dereplicating

Encountered an Error. Message: IPyradError:

Fatal error: FASTQ input is only allowed with the fastx_uniques command

Parallel connection closed.

I haven't edited the params file on line 14, so it is set to the default 0.85. Not entirely sure what is going on here

isaacovercast commented 2 years ago

Glad you got step 1 working.

The step 3 error is fixed in the most recent version of ipyrad: https://github.com/dereneaton/ipyrad/issues/473

Please update to the latest version 0.9.84 and try again.

I'm closing this issue, because the original issue was fixed. If you still have problems with step 3 and fastx_uniques please update issue 473 (linked above). Thanks!