OSOceanAcoustics / echopop

Generating biological estimates for animal "pop"ulation from echosounder data
https://echopop.readthedocs.io/
Apache License 2.0
1 stars 4 forks source link

Ensuring biological data column names match those in expected data files #261

Closed brandynlucca closed 2 days ago

brandynlucca commented 1 month ago

It seems that the files used for designing this version of echopop have entirely different column names than those generated by NWFSC. There is also some inconsistency in the spreadsheet column naming (and number of columns) across years. This needs to be amended since the data will otherwise be unreadable and yield an Error when trying to initialize the Survey-class object.

leewujung commented 1 month ago

Do you mean the files that you have been working with and the files generated from the ship?

I guess I am not sure what are "the files used for designing this version of echopop" vs "those generated by NWFSC" - since both are from NWFSC?

brandynlucca commented 1 month ago

This refers to the *.xlsx biological files used for post-processing. The *.xlsx files I have been using to develop the package have different column names than the actual files FEAT has used in the past for EchoPro.

leewujung commented 1 month ago

Interesting. I wonder how that occurred. I know Emilio manually changed some column names, and at least some of the changes were recorded, but perhaps some were not. I think the files were generated from the database (maintained by Alicia in recent years), so this goes back to this other issue (I'll try to find it) that we need to settle down what column names we want in Echopop, and let Alicia know (I think she said that when we last met with her).

rebeccathomas-NOAA commented 1 month ago

I've got an e-mail out to Alicia about current database capabilities; I haven't heard back yet. In the meantime, I went through the biodata files and tried each one to see which ones had column naming issues. biodata_catch: columns expected but not found: ['haul_weight', 'species_id', 'haul_num'] biodata_gear: fine biodata_haul:fine biodata_length: columns expected but not found: ['species_id', 'sex', 'haul_num', 'length_count', 'length'] biodata_specimen_ages: Missing columns in the Excel file ['sex', 'species_id', 'haul_num', 'weight', 'age', 'length']

brandynlucca commented 2 days ago

This has been addressed in #268 and can therefore be closed.