StoXProject / RstoxData

R library for reading various biotic and acoustic data formats
https://stoxproject.github.io/RstoxData/
GNU Lesser General Public License v3.0
0 stars 1 forks source link

Inconsistent handling of serialnumber in StoX... does it have a different meaning in Biotics and StoX? #290

Open yreecht opened 1 year ago

yreecht commented 1 year ago

Hi,

I believe what follows is rather an issue with StoX than improperly entered data for some surveys in Biotics. It may also be linked to issue #267.

Here is a minimal example for reproducing the issue:


library(dplyr)
library(RstoxData)

## Last snapshot downloaded from https://datasetexplorer.hi.no/apps/datasetexplorer/v2/navigation/Cruises/Forskningsfart%C3%B8y/2018/G.O.Sars_LMEL/2018102
dataFile <- "./biotic_cruiseNumber_2018102_G+O+Sars_2021-01-21T08.58.28.744Z.xml"

dataB <- RstoxData::ReadBiotic(FileNames = dataFile)
dataSB <- RstoxData::StoxBiotic(BioticData = dataB)

dataSB <- AddToStoxBiotic(StoxBioticData = dataSB, BioticData = dataB,
                          VariableNames = c("serialnumber"))

dataSB$Haul %>%
    filter(HaulKey %in% c(60038:60039))

## Serial number is attached to the station:
dataSB$Station %>%
    filter(serialnumber %in% c(60038:60039))

hh <- MergeStoxBiotic(StoxBioticData = dataSB, TargetTable = "Haul")

## Inconsistency between serialnumber and HaulKey (which corresponds to serialnumber in Biotics file)
hh %>%
    filter(HaulKey %in% c(60038:60039) |
           serialnumber %in% c(60038:60039))

## Note that position, time, depth,... above are now wrong for the second haul of the station!
dataB[[1]]$fishstation %>%
    select(missiontype:longitudeend) %>%
    filter(serialnumber %in% c(60038:60039))

This happens when several hauls were done at the same location (called station in BioticEditor: see contextual help screenshots below). Biotic_2023-03-03_09-31

2023-03-03_10-30

2023-03-03_10-46

It seems however that StoX attaches the serial number, supposed in Biotics to be unique to the sampling unit, to the station rather than the haul.

For the time being I will be using, as a workaround, the HaulKey (which otherwise seems to be consistently the same as serialnumber) where I was usually using the serial number (and drop that one), but this is quite confusing and leads to inconsistent data when merging tables.

Best wishes, Yves

[EDIT:] There is in fact a warning message regarding multiple hauls per station that cannot been handled:

Warning message:
In firstPhase(BioticData, datatype, stoxBioticObject, SplitTableAllocation = SplitTableAllocation) :
  StoX: There are more than one **'serialnumber' (HaulKey in StoxBioticData)** for 14 out of 88 'station'(StationKey in StoxBioticData) in the NMDBiotic data. In DefineBioticAssignment() it is currently only possible to asssing all hauls of a station in the map (manual assignment). If certain Hauls should be exclcuded, use FilterStoxBiotic(). More than one serialnumber for the following cruise/station (of the fishstation table of the BioticData):
    2018102/38
    2018102/39
        [...]

What is confusing is then why is "serialnumber" attached to the Station table, and not the Haul in the StoxBiotic object.

edvinf commented 1 year ago

Thank you for reporting.

It seems to me that the problem relates to some unexpected defaults in AddToStoxBiotic, and some unexpected mappings between NMDbiotic and StoxBiotic. One of these should probably be considered a bug, but I believe they can be worked around by changing the arguments to AddToStoxBiotic.

Postions are in StoxBiotic accosiated with station and not haul. This is by design and is documented in https://stoxproject.github.io/RstoxData/reference/StoxBioticFormat.html. When converting from NMDbiotic, the first position among several hauls at a station is selected. I have not been able to find documentation for that, and I think it should be added to the documentation for the function StoxBiotic.

Serialnumbers are correctly used to construct hual-ids and not station ids, but AddToStoxBiotic puts serialnumber at station by default. I think this should be considered a bug.

Both problems can be overcome by overriding defaults to AddToStoxBiotic. E.g. replacing AddToStoxBiotic-the line in your example with:

dataSB <- AddToStoxBiotic(StoxBioticData = dataSB, BioticData = dataB, VariableNames = c("serialnumber", "latitudestart", "longitudestart"), SplitTableAllocation = "Lowest")

should produce:

hh %>% filter(HaulKey %in% c(60038:60039) | serialnumber %in% c(60038:60039)) CruiseKey StationKey HaulKey Cruise Platform Station CatchPlatform DateTime 1: 2018102/4/2018/4174/2 38 60038 2018102/4/2018/4174/2 4174 2018102/4/2018/4174/2-38 4174 2018-02-19 11:58:54 2: 2018102/4/2018/4174/2 38 60039 2018102/4/2018/4174/2 4174 2018102/4/2018/4174/2-38 4174 2018-02-19 11:58:54 Latitude Longitude BottomDepth Haul Gear TowDistance EffectiveTowDistance MinHaulDepth MaxHaulDepth 1: 58.84533 2.900333 121.765 2018102/4/2018/4174/2-38-60038 3194 2.06982 2.06982 120.73 122.68 2: 58.84533 2.900333 121.765 2018102/4/2018/4174/2-38-60039 3136 0.76123 0.76123 NA NA VerticalNetOpening HorizontalNetOpening TrawlDoorSpread serialnumber latitudestart longitudestart 1: 3.20 NA 88 60038 58.84533 2.900333 2: 2.56 NA NA 60039 58.84267 2.878833

Note that serialnumbers now correspond with the HaulKey and that the columns 'latitudestart' and 'longitudestart' differs from 'Latitude' and 'Longitude'.

yreecht commented 1 year ago

Thanks @edvinf for the clarification. I will try the workaround you suggest.

edvinf commented 1 year ago

AddToStoxBiotic still puts serialnumber on station, rather than haul, by default (v.1.10.1). @arnejohannesholmin . Is this to be considered a bug, or should the issue be closed?