broadinstitute / CP186-A549-WG

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Process spots error #1

Closed gwaybio closed 2 years ago

gwaybio commented 3 years ago

A new error appeared!

Now processing spots for CP186D-Well2-15...part of set ALLBATCHES___ALLPLATES___ALLWELLS
Now processing spots for CP186D-Well2-16...part of set ALLBATCHES___ALLPLATES___ALLWELLS
Traceback (most recent call last):
  File "recipe/0.preprocess-sites/1.process-spots.py", line 156, in <module>
    foci_df = pd.read_csv(foci_file)
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/io/parsers.py", line 468, in _read
    return parser.read(nrows)
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/io/parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/ubuntu/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/pandas/io/parsers.py", line 2036, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 827, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1951, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1011 fields in line 3, saw 1021

@ErinWeisbart - any idea what's going on? It looks like the foci data for site CP186D-Well2-16 is corrupted somehow. Would you know off the top of your head why?

ErinWeisbart commented 3 years ago

Logs for that site look normal, so I don't know. Working at genome scale I wouldn't be surprised to lose a site or two to a corrupted/incomplete file. My suggestion would be to update the recipe so it skips ParserErrors here, but log and check how many times it happens? Or to avoid having something you have to check, what about if there is a ParserError it initiates a counter and if the counter hits 5 (a number I arbitrarily chose right now) then it stops the run.

gwaybio commented 3 years ago

Sounds good, thanks @ErinWeisbart - I added broadinstitute/pooled-cell-painting-profiling-recipe#79 to track this fix.

ErinWeisbart commented 2 years ago

I'm now running CP186 through the recipe and it proceeded past CP186D-Well2-16 no problem but looks like it errored on CP186G-Well2-66 (Unfortunately, I don't have the stack trace).

ErinWeisbart commented 2 years ago

And now on another run also passed CP186D-Well2-16 just fine but errored at CP186D-Well5-31 (which it passed fine last time). So another suggestion that it's stochastic.

ErinWeisbart commented 2 years ago

Thought I had fixed the error handling (spoiler alert: I hadn't) and ran again and it errored at the same site this time and I think at least this time it has to do with the files not being unarchived. (Though FWIW I believe the file restoration error is stochastic..)

ErinWeisbart commented 2 years ago

I believe switching to naked excepts from https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/commit/cbde4011ace3476005b87d6661a9996a0129aa61 handles this error (though I also re-unarchived the files before running). Regardless, I'm closing this as this is no longer a problem for CP186 as we've progressed beyond this point in running the recipe.