Open BryonLewis opened 3 years ago
Categories:
Each pipeline/training config:
sefsc
Description:
I made a very horrible attempt here:
{
"argumentTypes": {
"kwiverPipeline": {
"platforms": ["web", "desktop"],
"supports": [
"image-sequence",
"video"
],
"arguments": {
"global": {
"pipeline": {
"arg": "-p {pipeline}",
"type": "pipe"
},
"input_file": {
"type": "txt",
"description": "Either list of images for the folder or a single input video file",
"arg": "-s input:video_filename={input_file}",
},
"detector_output_file": {
"arg": "-s detector_writer:file_name={detector_output_file}",
"description": "Output for the detections"
},
"track_output_file": {
"type": "viame_csv",
"arg": "-s track_writer:file_name={track_output_file}",
}
},
"video": {
"video_reader": {
"arg": "-s input:video_reader:type=vidl_ffmpeg",
"description": "additional input required for video reading",
}
}
}
},
"kwiverPipelineInput": {
"platforms": ["web", "windows", "linux", "mac"],
"supports": [
"image-sequence"
], //Right now it only works on image sequences
"arguments": {
"global": {
"detection_input_file":{
"arg": "-s detection_reader:file_name={detection_input_file}",
"type": "viame_csv"
},
"track_input_file": {
"arg": "-s track_reader:file_name={track_input_file}",
"type": "viame_csv"
}
}
}
},
"kwiverTraining": {
"platforms": ["web", "desktop"],
"supports": [
"image-sequence",
"video"
],
"arguments": {
"global": {
"input_folder_file_list": {
"arg": "-il {input_folder_file_list}",
"type":"txt",
"description":"List of folders and/or videos used for training inputs",
},
"input_ground_truth_list": {
"arg": "-it {input_ground_truth_list}",
"description": "List of the groundTruth viame_csv files for each item in the folder_file_list"
},
"training_configuration_file": {
"arg": "-c {training_configuration_file}",
"type": "conf"
},
"prevent_interrruptions": {
"arg": "--no-query",
"description": "Argument to prevent training from needing user input"
}
},
"desktop": {
"no-adv-prints": {
"arg": "--no-adv-prints",
"description": "Changes console logging style for windows"
},
"no-embedded-pipe": {
"arg": "--no-embedded-pipe",
"description": "Desktop requirement to train pipelines properly."
}
}
}
}
},
"Pipes": [
{
"name": "Generic Tracker",
"filename": "tracker_generic.pipe",
"backend": "kwiver", // Future other backend pipes that can be run
"argumentTypes": [
"kwiverPipeline"
],
"description": "Generic tracker which returns tracks with confidence. Maybe some more about the system",
"types": [
"vertebrate",
"invertebrate"
], //These might not match up but you get the idea
"category": "Trackers", // Trackers | Detectors | Training | Utilities
"tags": [
"generic",
"tracker"
], // Hopefully used in the future for filtering like removing 'netharn' or 'local' or 'svm',
"requirements": {
"gpu": false,
"gpuMinMemoryGBs": "1" // 0 is obviously who cares
}
},
{
"name": "User Input Detections Tracker",
"filename": "tracker_track_user_selections.pipe",
"backend": "kwiver", // Future other backend pipes that can be run
"argumentTypes": [
"kwiverPipeline",
"kwiverPipelineInput"
], //It takes both the kwiverPipeline and kwiverPipelineInput args
"description": "Provide detections on first frame of objects to produce tracking results",
"types": [], // None for this pipeline
"category": "Utilities", // I don't know if this is a utility or not
"tags": [
"input",
"tracker"
], // Hopefully used in the future for filtering like removing 'netharn' or 'local' or 'svm',
"requirements": {
"gpu": true,
"gpuMinMemoryGBs": "1", //Optional 0 is obviously who cares
}
},
{
"name": "Train Netharn Cascade",
"filename": "train_netharn_cascade.viame_csv.conf",
"backend": "kwiver", // Future other backend pipes that can be run
"argumentTypes": [
"kwiverTraining"
],
"description": "Takes input detections and trains a model of the corresponding types",
"types": [],
"category": "Training", // I don't know if this is a utility or not
"tags": [
"training",
"netharn",
"cascade",
"viame_csv"
],
"requirements": {
"gpu": true,
"gpuMinMemoryGBs": "2", //Optional
}
}
]
}
This is a little heavier than I think we need right now. I'm worried that this may be over-engineered, but I'm certain that this change takes complexity represented in code and moves that complexity into a configuration scheme.
I'd rather deal with complexity in code because I think TypeScript is safer and more expressive than JSON, and provides better tools to manage that complexity. We can also absorb changes much more easily in code.
It also has a few features that control things we don't support (and have no plan to support).
My proposal would be to...
argumentTypes
schema, and express these entirely in code. I'm fine with keeping the argumentTypes
pipeline parameter, but I'd rather that be a statically defined function than something with dynamic behavior driven by JSON.
argumentTypes[].kwiverPipelineInput
would correspond to a function that applies the proper arguments to the kwiver input command (and generates the input, and whatever else) rather than having that read more fine-grained config from a schema. This is basically what we already do, but broken out into coarse-grained groups of arguments instead of this fine-grained config-driven approach.Pipes[].requirements
, as we have no current plan to support the use of that information. CPU-only pipelines and memory management aren't in our backlog. It doesn't hurt to have these, so I'm don't feel as strongly, but at first glance, it seems a bit YAGNI.The rest of the pipes schema looks great. tags, types, etc. are all awesome. I'd like to follow up about some external factors, like were to look for this file.
This isn't a total rejection of argumentTypes
, but I don't think there's a strong case for them yet. This could be broken into multiple parts, where we first implement argumentTypes in TS, then later, if that choice appears to be causing problems, we migrate that complexity into config.
What do you think?
I think my goal was just to have argumentTypes
defined someplace that was a union between different sets of arguments for pipes, datatypes (video/image-sequence) and platforms (web/dekstop) without having to look through each individual pipe for their parameters. It living in the TS and Python backends I think is a better as long as we have some good documentation for what each type is expecting/requires.
@subdavis seems to be completed, can you verify?
That PR was related, but does not fix the actual topic of this PR. We still have a lot of work to do to come up with a way to reliably describe the configurations available in DIVE.
@subdavis Just want to add some thoughts to the code vs. json schema discussion. I don't have input on the specifics of implementation besides just mentioning that I think it would be useful to consider how custom pipelines specific to a group or project fit in. Looking at the add-on packages now available in VIAME I think its clear that there is a need for projects to be able to use their own pipelines with these interfaces and a good argument to be made for a way for people to have custom pipelines in DIVE. Though this is not totally within the subject of this issue I think it could be useful to consider in this work as it could make it easier/harder for more support of custom pipelines to be added in the future. A few notes:
Just some thoughts. Even though we're moving towards running pipelines elsewhere I do thing there is benefit to having some of our pipelines in DIVE and I imagine other groups with VIAME add-ons may feel similarly if not now, then likely at some point.
Edit: Also I'm only talking about processing pipelines like detectors, trackers, etc... not training pipelines
I'd sort of like to abstract this to have some "function (pipeline_name: str) -> Capabilities and Requirements Dict" Right now, that's based on name, but in the future, we can incorporate other out-of-band knowledge.
Gathering some requirements and ideas for a structure to hold the json pipelines as well as the possibly the training configs for the system
My guess at requirements: