gjearevoll / BioDivMapping

A pipeline dedicated to analysing and visualising the biodiversity of different taxa in Norway
GNU General Public License v3.0
5 stars 3 forks source link

Bias field is estimated for all datasets when we specify no datasets are biased #128

Open RRTogunov opened 6 months ago

RRTogunov commented 6 months ago

This issue might never actually manifest, but I'm pretty sure the defineBiasFields() function will specify that all datasets with data should have a bias field estimated when the metadataSummary.csv (which specifies which datasets should have a bias field) specifies FALSE for estimating bias for all datasets. Here is the line in question: https://github.com/gjearevoll/BioDivMapping/blob/d345227eb37120d822bbe4b1a211456fc64a05d5/functions/defineBiasFields.R#L54

If my logic is right, dataTypes is taken from metadataSummary.csv, and if bias column is all FALSE (i.e., no bias fields are to be estimated), then biasedDatasets is blank, in which case all useableDatasets (species datasets with data) are added to suggestedDatasets, which are then all added to biasFieldList, and consequently workflow$biasFields.

I'm not sure this is the desired behaviour, but I would like someone else to confirm. should the end of the code be as follows:

# Check if there are datasets that were defined in our metadata file
if (nrow(biasedDatasets) != 0) {
  suggestedDatasets <- biasedDatasets$name[biasedDatasets$name %in% useableDatasets] |>
    # Turn into shortened form
    sapply(FUN = function(x) {
      gsub("[[:punct:]]", "",gsub(" ", "", x))
    })
} else {
  suggestedDatasets <- NULL
}

biasFieldList[[t]] <- suggestedDatasets

Alternatively, if `nrow(biasedDatasets) != 0`` should the bias be estimated for all presenceOnly datasets with data as follows:

dataTypes$name[dataTypes$processing == "presenceOnly" & dataTypes$$name %in% useableDatasets]