Open maxshafer opened 3 years ago
Been looking at this, its strange to me, it's something to do with the separator when reading in the csv file (pd.read_csv). If you change line 169 (or 180) in tracks.py to
" data_s = pd.read_csv(os.path.join(folder, file), sep='/') "
you can load in the file, then I'm able to separate the columns using
str.split(',' expand = T)
without problems. There doesn't seem to be anything on the particular line(s) the the error spits out that is any different from the adjacent lines.
Thanks for picking up on this It's so strange that this has started happening and that there doesn't seem to be a difference in the lines that work and the ones which don't... I'll add in your solution :)
OK, so it is definitely from the QA I've been doing. Here's an alternative solution maybe (using error_bad_lines=False in the read_csv call). It skips the bad lines (usually 1 or 2 per file).
Here's the output from running the for loop in the load_als_files function with the above change:
b'Skipping line 4348535: expected 5 fields, saw 8\nSkipping line 4388263: expected 5 fields, saw 6\nSkipping line 4407123: expected 5 fields, saw 7\n' loaded file FISH20210428_c6_r0_Altolamprologus-shell_su_als.csv b'Skipping line 4333066: expected 5 fields, saw 8\n' loaded file FISH20210428_c5_r1_Altolamprologus-shell_su_als.csv b'Skipping line 4375129: expected 5 fields, saw 6\n' b'Skipping line 4475525: expected 5 fields, saw 6\n' loaded file FISH20210505_c2_r1_Altolamprologus-shell_su_als.csv b'Skipping line 4683840: expected 5 fields, saw 8\n' loaded file FISH20210428_c5_r0_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c1_r0_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c1_r1_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c2_r0_Altolamprologus-shell_su_als.csv
Very clearly just the als files I QA'd, with either split_tracks or divid_tracking.
If you make the change make sure it is on both line 168 and 180.
Hmm, thanks. The error_bad_lines is useful but I was cautious putting it in without knowing why the lines are wrong. Great to have the problem linked to the QA'd files. I'll check on the them
Hey, reading the same file into R I get this warning message:
Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
And the data frame looks like this (offending line is 4333066):
X tv_ns speed_mm x_nt y_nt
4333060 4.333059e+06 5.186074e+14 3.3963970 135 730 4333061 4.333060e+06 5.186075e+14 3.3963970 136 730 4333062 4.333061e+06 5.186076e+14 2.5472977 136 730 4333063 4.333062e+06 5.186077e+14 1.6981985 136 730 4333064 4.333063e+06 5.186078e+14 0.8490992 136 730 4333065 4.333064e+06 5.186079e+14 0.8490992 136 730 4333066 8.490992e-01 1.570000e+02 729.0000000 NA NA 4333067 4.471860e+06 5.324482e+14 0.0000000 157 729 4333068 4.471861e+06 5.324483e+14 0.8490992 157 729 4333069 4.471862e+06 5.324484e+14 0.8490992 156 729 4333070 4.471863e+06 5.324485e+14 0.8490992 156 729
Maybe that helps, dunno!
Oooo! Useful, thanks :)
Changing line 351 in 'run_fish_als.py' to:
als_df.to_csv(os.path.join(rootdir, "{}_als.csv".format(fish_ID)), encoding='utf-8-sig')
seems to fix the problem. Best guess is that the divide_tracking and split_tracking.py runs create some weird character, that can't be encoded correctly by the run_fish_als.py when it using 'to_csv'
Also happeens for me: /Users/annikanichols/Documents/cichlid-analysis/.venv/bin/python3 /Users/annikanichols/Documents/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-24 19:01:01.619 Python[56341:25724689] WARNING: <NSOpenPanel: 0x7fd5320d6ef0> running implicitly; please run panels using NSSavePanel rather than NSApplication. b'Skipping line 5122268: expected 5 fields, saw 6\n' loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv b'Skipping line 4620509: expected 5 fields, saw 8\nSkipping line 4639163: expected 5 fields, saw 8\nSkipping line 4714433: expected 5 fields, saw 8\n' b'Skipping line 4861739: expected 5 fields, saw 8\n' loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4991185: expected 5 fields, saw 7\n' b'Skipping line 4490625: expected 5 fields, saw 6\nSkipping line 4546975: expected 5 fields, saw 6\n'
So, this error must happen from the run_fish_als script, as sometimes if run again, it works. Adding in a function that checks the saved out als file to see if it is correct or not, then tries to save it using np.savetxt instead, also has the encoding='utf-8-sig' parameter for the pd.to_csv
Reran c1_r0 and then the rest of the erroring ones shown below: /Users/annikanichols/Documents/cichlid-analysis/.venv/bin/python3 /Users/annikanichols/Documents/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-30 16:55:00.225 Python[7295:162591] WARNING: <NSOpenPanel: 0x7ff0347d2310> running implicitly; please run panels using NSSavePanel rather than NSApplication. loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv b'Skipping line 4620509: expected 5 fields, saw 8\nSkipping line 4639163: expected 5 fields, saw 8\nSkipping line 4714433: expected 5 fields, saw 8\n' b'Skipping line 4861739: expected 5 fields, saw 8\n' b'Skipping line 4991185: expected 5 fields, saw 7\n' loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4490625: expected 5 fields, saw 6\nSkipping line 4546975: expected 5 fields, saw 6\n' loaded file FISH20210609_c2_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c2_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c3_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c3_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4508145: expected 5 fields, saw 6\n' b'Skipping line 4601161: expected 5 fields, saw 6\n' loaded file FISH20210609_c4_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c4_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c5_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c6_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c6_r1_Julidochromis-ornatus_su_als.csv
So the problem one: loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c2_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c4_r0_Julidochromis-ornatus_su_als.csv
Another bug - running combine als on Altshe data that has been retracked then divide_tracking. Also tried re-making als files (running fish_als again) to no avail.
/usr/local/bin/python3.7 /Users/maxwellshafer/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-10 12:58:45.398 Python[27248:1217134] WARNING: <NSOpenPanel: 0x7f8552f71cf0> running implicitly; please run panels using NSSavePanel rather than NSApplication. Traceback (most recent call last): File "/Users/maxwellshafer/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py", line 47, in
fish_tracks = load_als_files(rootdir)
File "/Users/maxwellshafer/cichlid-analysis/cichlidanalysis/io/tracks.py", line 180, in load_als_files
data = pd.read_csv(os.path.join(folder, file), sep=',')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 458, in _read
data = parser.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 1196, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 2155, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4683840, saw 8
Process finished with exit code 1