annnic / cichlid-analysis

cichlid behaviour analysis
GNU General Public License v3.0
0 stars 2 forks source link

Another bug with 'run_combine_als' #31

Open maxshafer opened 3 years ago

maxshafer commented 3 years ago

Another bug - running combine als on Altshe data that has been retracked then divide_tracking. Also tried re-making als files (running fish_als again) to no avail.

/usr/local/bin/python3.7 /Users/maxwellshafer/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-10 12:58:45.398 Python[27248:1217134] WARNING: <NSOpenPanel: 0x7f8552f71cf0> running implicitly; please run panels using NSSavePanel rather than NSApplication. Traceback (most recent call last): File "/Users/maxwellshafer/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py", line 47, in fish_tracks = load_als_files(rootdir) File "/Users/maxwellshafer/cichlid-analysis/cichlidanalysis/io/tracks.py", line 180, in load_als_files data = pd.read_csv(os.path.join(folder, file), sep=',') File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 686, in read_csv return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 458, in _read data = parser.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 1196, in read ret = self._engine.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 2155, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4683840, saw 8

Process finished with exit code 1

maxshafer commented 3 years ago

Been looking at this, its strange to me, it's something to do with the separator when reading in the csv file (pd.read_csv). If you change line 169 (or 180) in tracks.py to

" data_s = pd.read_csv(os.path.join(folder, file), sep='/') "

you can load in the file, then I'm able to separate the columns using

str.split(',' expand = T)

without problems. There doesn't seem to be anything on the particular line(s) the the error spits out that is any different from the adjacent lines.

annnic commented 3 years ago

Thanks for picking up on this It's so strange that this has started happening and that there doesn't seem to be a difference in the lines that work and the ones which don't... I'll add in your solution :)

maxshafer commented 3 years ago

OK, so it is definitely from the QA I've been doing. Here's an alternative solution maybe (using error_bad_lines=False in the read_csv call). It skips the bad lines (usually 1 or 2 per file).

Here's the output from running the for loop in the load_als_files function with the above change:

b'Skipping line 4348535: expected 5 fields, saw 8\nSkipping line 4388263: expected 5 fields, saw 6\nSkipping line 4407123: expected 5 fields, saw 7\n' loaded file FISH20210428_c6_r0_Altolamprologus-shell_su_als.csv b'Skipping line 4333066: expected 5 fields, saw 8\n' loaded file FISH20210428_c5_r1_Altolamprologus-shell_su_als.csv b'Skipping line 4375129: expected 5 fields, saw 6\n' b'Skipping line 4475525: expected 5 fields, saw 6\n' loaded file FISH20210505_c2_r1_Altolamprologus-shell_su_als.csv b'Skipping line 4683840: expected 5 fields, saw 8\n' loaded file FISH20210428_c5_r0_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c1_r0_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c1_r1_Altolamprologus-shell_su_als.csv loaded file FISH20210505_c2_r0_Altolamprologus-shell_su_als.csv

Very clearly just the als files I QA'd, with either split_tracks or divid_tracking.

If you make the change make sure it is on both line 168 and 180.

annnic commented 3 years ago

Hmm, thanks. The error_bad_lines is useful but I was cautious putting it in without knowing why the lines are wrong. Great to have the problem linked to the QA'd files. I'll check on the them

maxshafer commented 3 years ago

Hey, reading the same file into R I get this warning message:

Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input

And the data frame looks like this (offending line is 4333066):

               X        tv_ns    speed_mm x_nt y_nt

4333060 4.333059e+06 5.186074e+14 3.3963970 135 730 4333061 4.333060e+06 5.186075e+14 3.3963970 136 730 4333062 4.333061e+06 5.186076e+14 2.5472977 136 730 4333063 4.333062e+06 5.186077e+14 1.6981985 136 730 4333064 4.333063e+06 5.186078e+14 0.8490992 136 730 4333065 4.333064e+06 5.186079e+14 0.8490992 136 730 4333066 8.490992e-01 1.570000e+02 729.0000000 NA NA 4333067 4.471860e+06 5.324482e+14 0.0000000 157 729 4333068 4.471861e+06 5.324483e+14 0.8490992 157 729 4333069 4.471862e+06 5.324484e+14 0.8490992 156 729 4333070 4.471863e+06 5.324485e+14 0.8490992 156 729

Maybe that helps, dunno!

annnic commented 3 years ago

Oooo! Useful, thanks :)

maxshafer commented 3 years ago

Changing line 351 in 'run_fish_als.py' to:

als_df.to_csv(os.path.join(rootdir, "{}_als.csv".format(fish_ID)), encoding='utf-8-sig')

seems to fix the problem. Best guess is that the divide_tracking and split_tracking.py runs create some weird character, that can't be encoded correctly by the run_fish_als.py when it using 'to_csv'

annnic commented 3 years ago

Also happeens for me: /Users/annikanichols/Documents/cichlid-analysis/.venv/bin/python3 /Users/annikanichols/Documents/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-24 19:01:01.619 Python[56341:25724689] WARNING: <NSOpenPanel: 0x7fd5320d6ef0> running implicitly; please run panels using NSSavePanel rather than NSApplication. b'Skipping line 5122268: expected 5 fields, saw 6\n' loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv b'Skipping line 4620509: expected 5 fields, saw 8\nSkipping line 4639163: expected 5 fields, saw 8\nSkipping line 4714433: expected 5 fields, saw 8\n' b'Skipping line 4861739: expected 5 fields, saw 8\n' loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4991185: expected 5 fields, saw 7\n' b'Skipping line 4490625: expected 5 fields, saw 6\nSkipping line 4546975: expected 5 fields, saw 6\n'

annnic commented 3 years ago

So, this error must happen from the run_fish_als script, as sometimes if run again, it works. Adding in a function that checks the saved out als file to see if it is correct or not, then tries to save it using np.savetxt instead, also has the encoding='utf-8-sig' parameter for the pd.to_csv

Reran c1_r0 and then the rest of the erroring ones shown below: /Users/annikanichols/Documents/cichlid-analysis/.venv/bin/python3 /Users/annikanichols/Documents/cichlid-analysis/cichlidanalysis/analysis/run_combine_als.py 2021-06-30 16:55:00.225 Python[7295:162591] WARNING: <NSOpenPanel: 0x7ff0347d2310> running implicitly; please run panels using NSSavePanel rather than NSApplication. loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv b'Skipping line 4620509: expected 5 fields, saw 8\nSkipping line 4639163: expected 5 fields, saw 8\nSkipping line 4714433: expected 5 fields, saw 8\n' b'Skipping line 4861739: expected 5 fields, saw 8\n' b'Skipping line 4991185: expected 5 fields, saw 7\n' loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4490625: expected 5 fields, saw 6\nSkipping line 4546975: expected 5 fields, saw 6\n' loaded file FISH20210609_c2_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c2_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c3_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c3_r1_Julidochromis-ornatus_su_als.csv b'Skipping line 4508145: expected 5 fields, saw 6\n' b'Skipping line 4601161: expected 5 fields, saw 6\n' loaded file FISH20210609_c4_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c4_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c5_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c6_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c6_r1_Julidochromis-ornatus_su_als.csv

So the problem one: loaded file FISH20210609_c1_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c1_r1_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c2_r0_Julidochromis-ornatus_su_als.csv loaded file FISH20210609_c4_r0_Julidochromis-ornatus_su_als.csv