AgPipeline / issues-and-projects

Repository for issues and projects
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

SciF implementation fetching BETYdb plots needs to be able to filter on date #191

Closed Chris-Schnaufer closed 3 years ago

Chris-Schnaufer commented 4 years ago

The current behavior or issue Currently the BETYdb plot retrieval doesn't filter on date, opening the possibility of the wrong set of plots to be used by plot clipping.

The steps taken to reproduce the behavior or issue, or specify a location where the steps were recorded Specify a BETYdb instance that has multiple plots intersecting the same image over several seasons.

Expected behavior The correct plots are used based upon the date specified during processing.

Screenshots N/A

Add other supporting information that may be useful N/A

Completion criteria This is done when the correct date is used to filter plots to be used

Developer notes The betydb2geojson.py file should include the fetched start and end dates for plots in the GeoJSON written this will allow plot clipping to check for these dates in the properties key in the GeoJSON and filter on them if found.

Chris-Schnaufer commented 4 years ago

There's a need for specifying the additional filters to the BETYdb fetching code. I don't want to add them as another parameter to the SciF docker run command since it'll make handling '--clean' harder. My thought is to append a colon followed by the parameters to end of the URL. For example: https://terraref.ncsa.illinois.edu/bety:name=Season 5,description=diversity panel.

See the following for the current documentation (more to come): https://github.com/Chris-Schnaufer/drone-makeflow/blob/4db68eae7dd471ee1f49101d8124f7efdbc92aab/betydb2geojson.py#L38

Added per @julianpistorius request:

docker run --rm -v "${PWD}/inputs:/scif/data/odm_workflow/images" -v "${PWD}/outputs:/output" agdrone/canopycover-workflow:latest run short_workflow "https://terraref.ncsa.illinois.edu/bety:name=Season 5,description=diversity panel"

Original message chain: https://cyverse.slack.com/archives/DKV587DRU/p1597340570005900

julianpistorius commented 4 years ago

Yeah, that's tricky. The colon separated solution is fragile and ugly for users to deal with.

Instead I would solve it by moving configuration to jx-args.json instead of passing everything in as command line arguments, as documented here:

https://github.com/AgPipeline/issues-and-projects/issues/214

At some point (now?) it becomes easier to use the built-in configuration mechanism for Makeflow.

dlebauer commented 4 years ago

Just to make sure this is on the right path ...

Chris-Schnaufer commented 4 years ago

@dlebauer I think what you're saying is that for any Orthomosaic there will only be one relevant set of sites for a particular day that intersect that site. In other words, if a drone is flown over a large field, there can only be on experiment for that field when filtered by date - removing the need for additional filtering?

dlebauer commented 4 years ago

for a large field, there can only be on experiment for that field when filtered by date

yes

Chris-Schnaufer commented 4 years ago

Here is an image that made me think otherwise. I think I'm missing something around this. How does should this interact with BETYdb (so I can be sure I have it right)? Thanks. (Exported and scaled so fidelity is lost) cimmyt_180322_orthophoto

julianpistorius commented 4 years ago

@Chris-Schnaufer For a more general solution for the problem you're having, the following related/complementary improvements would make everything much simpler and more robust (especially items 1 and 4):

  1. I would like to not have SCIF entrypoints for each different workflow #232
  2. Numbered arguments are error prone and fragile #233
  3. Potential security problem with how we treat jx-args.json #234
  4. Using sub-workflows instead of 'pre-flight' shell scripts
Chris-Schnaufer commented 4 years ago

Thanks @julianpistorius . I hadn't forgotten about those. David appears to be saying this issue isn't needed (only need date filter) and I'm trying to understand that.

dlebauer commented 4 years ago

@Chris-Schnaufer I meant you need to filter experiments on date, find the plots associated w/ the experiment, and then filter on location - the intersection of the image and the experimental plots. The image you presented looks like two experiments.

Chris-Schnaufer commented 4 years ago

@dlebauer In the above case, it's OK that both Experiments are processed at the same time? There isn't a chance of plot name conflict? I'm trying to see if we'll ever need additional filters on the Experiments returned from BETYdb.

dlebauer commented 4 years ago

there would be a chance of plot name conflict - a simple solution would be to organize the outputs as experimentname/plotname or experimentname/date/plotname or experimentname/datetime/plotname when there are >1 experiments, >1 dates per experiment or >1 captures per day (respectively).

perhaps more simply, consistently use experimentname/datetime/plotname

Chris-Schnaufer commented 4 years ago

Thanks @dlebauer and @julianpistorius

In conclusion:

  1. Never a need for additional filters (all intersecting plots for all experiments with the right date range all the time)
  2. Change the resulting file layout to use: experimentname/datetime/plotname path particle
Chris-Schnaufer commented 4 years ago

Removing from Sprint since this requires changes to output folder structures

Chris-Schnaufer commented 3 years ago

Another way to resolve this using folder structures is to assume the user knows what they're doing: they can specify any folder they want when they run the SciF apps, so they can manage where they want stuff to end up.

Chris-Schnaufer commented 3 years ago

Closing as no longer an issue