agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
MIT License
110 stars 24 forks source link

Default options not getting passed to integrity_check_json_db.py #82

Closed agentmorris closed 1 year ago

agentmorris commented 1 year ago

I was just trying to validate a COCO for Camera Traps .json file with integrity_check_json_db.py with the following command:

python integrity_check_json_db.py "/Users/nathaniel.rindlaub/Downloads/8d3b8eaba5f3acccbe3af01ca0ffdfe2_coco.json"

And ran into the following error on line 196: AttributeError: 'Namespace' object has no attribute 'bRequireLocation'.

I could be misunderstanding something (not a Python guy) but looks to me like options will never be None–and thus the defaultOptions will never get assigned to it–because we pass the parsed args directly from argparse into integrity_check_json_db() as the second param, and the args will always include some default values and a value for jsonFile.

...
def integrity_check_json_db(jsonFile, options=None):
    '''
    jsonFile can be a filename or an already-loaded json database

    return sortedCategories, data, errorInfo
    '''

    if options is None:   

        options = IntegrityCheckOptions() # looks like this line might never get reached
...
...
    args = parser.parse_args()    
    integrity_check_json_db(args.jsonFile,args)
...

I fixed this locally by changing bRequireLocation -> dDoNotRequireLocation throughout and adding an arg for it and for nThreads to argparse:

    parser.add_argument('--bDoNotRequireLocation', action='store_true', 
                        help='Do not require location to be specified in image records (default is false)')
    parser.add_argument('--nThreads', action='store', type=int, default=10, 
                        help='Number of threads')

I'd be happy to submit a PR for this fix if that's at all helpful.


Issue cloned from Microsoft/CameraTraps, original issue posted by nathanielrindlaub on Nov 22, 2022.

agentmorris commented 1 year ago

Good catch. The deeper issue is that there are a handful of scripts that we use all the time, but never run from the command line (only from other Python code), so our command-line testing is... suboptimal.

The medium-deep issue is that we should not have been passing "args" like it was an options object; our normal paradigm is to use ct_utils.args_to_object to convert a set of command-line arguments to an options class (in this case, an instance of IntegrityCheckOptions). I did that here, and also added a "--bAllowNoLocation" option (although after this change, this would no longer have caused the script to crash, it's just a useful option to expose).

Thanks for finding this!


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

Awesome! Thanks for jumping on this so quickly!

Is there some basic documentation on how to get up and running w/ this repo locally so that I can take advantage of all the fantastic data management & COCO utility tools? Again, Python novice here, but I don't see a requirements.txt anywhere or instructions on what version of python to be running, so thus far I've just been making some guesses and installed dependencies one by one when it complained at me.

Along those lines, with the latest updates to integrity_check_json_db.py I'm now getting a ModuleNotFoundError: No module named 'ct_utils' error... I can see that ct_utils.py is in the root directory, and I tried setting $PYTHONPATH to point to that, but it still isn't finding the module. This is probably Python 101, so apologies in advance.


(Comment originally posted by nathanielrindlaub)

agentmorris commented 1 year ago

Indeed, you can roughly divide this repo into:

  1. All the scripts needed to run MegaDetector (and some classifiers) (and the modules those scripts depend on)
  2. Everything else

I'd like to think that we're pretty meticulous about documentation and environment management for (1), but for (2)... not so much. We used to maintain separate instructions for (2), and we still have a mostly-correct conda environment file for (2), but so few users were doing anything other than running MegaDetector that we stopped explicitly supporting (2).

That said, we're not that far off: I use most of these modules regularly, and I only have one conda environment, which is almost just the recommended environment for running MegaDetector.

So if you are likely to use non-MegaDetector-related scripts in this repo, I recommend following the MegaDetector setup instructions to set up a conda environment, which will give you the correct Python version, the correct dependent repos (e.g. ai4eutils), and almost all of the correct Python packages. The only thing you should ever need to do is occasionally pip-installing something that isn't present in the MegaDetector conda environment file.

The only downside of this approach if you're not running MegaDetector at all in this environment is that you're carrying around a giant PyTorch install. If you don't want to deal with installing PyTorch during the setup process, YMMV, but you should be able to substitute "environment.yml" for "environment-detector.yml" during the setup process.

Hope that helps!


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

Also, a more direct answer to your question about requirements.txt: because we have always used Anaconda for development, Conda environment files (in particular environment-detector.yml) play the role that requirements.txt files play in other repos.


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

Got it! Thank you Dan! I'll look through the MegaDetector setup instructions to get that conda env up and running. Cheers.


(Comment originally posted by nathanielrindlaub)