datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
15 stars 12 forks source link

crash on doc commands #435

Open bpinsard opened 6 months ago

bpinsard commented 6 months ago

Running

datalad -l 1 catalog-workflow --type new --catalog ./ --dataset /path/to/dataset/ --extractor metalad_core

I get the following traceback:

....
....
....
[Level 5] Finished setup_parser
[DEBUG  ] Parsing known args among ['/home/basile/.local/bin/datalad', '-l', '1', 'catalog-workflow', '--type', 'new', '--catalog', './', '--dataset', '/home/basile/data/tests/cneuromod/', '--extractor', 'metalad_core']
[Level 2] Entered eval_func for <function Workflow.__call__ at 0x70b50ea5f0a0>
[DEBUG  ] Determined class of decorated function: <class 'datalad_catalog.workflow.Workflow'>
[Level 2] Returning generator_func from eval_func for <class 'datalad_catalog.workflow.Workflow'>
[DEBUG  ] Command parameter validation for <class 'datalad_catalog.workflow.Workflow'>
[Level 8] 1 command parameter constraint violation
mode=<no value>
  missing required arguments [main.py:_run_with_exception_handler:190,exec.py:call_from_parser:107,interface_utils.py:_execute_command_:154,parameter.py:__call__:289]
[ERROR  ] 1 command parameter constraint violation
mode=<no value>
  missing required arguments
[Level 5] Exiting

Cannot figure out how --type does not get into the mode variable.

dlad_wtf.txt

jsheunis commented 5 months ago

Hey @bpinsard. I have no ideas off the top of my head, but I will try and reproduce this locally.

jsheunis commented 5 months ago

Interim note: I have been able to reproduce the issue locally. I'm running on a mac, python==3.10.12, datalad-next==1.3.0, datalad_catalog==1.1.1. This issue isn't picked up by existing tests, they all run successfully.

jsheunis commented 5 months ago

@bpinsard It seems like I've found a solution. For some reason that I haven't been able to figure out yet, the mismatch between the python variable mode and the Argparse argument type is causing trouble (see code):

mode=Parameter(
            # cmdline argument definitions, incl aliases
            args=("-m", "--mode"),
            # documentation
            doc="""Which type of workflow to run: one of ['new', 'update']""",
        ),

If I change the argument name to mode, the issue disappears. I am yet to figure out if this is related to some argparse changes or perhaps datalad-next updates...

When I ran the catalog-workflow command after the fix, I encountered a different new issue related to the Dataset class, which I also fixed. After that, the command ran fully (although the output wasn't great because I couldn't clone/install any of the subdatasets in https://github.com/courtois-neuromod/cneuromod).

Both fixes are included in this patch, also displayed below:

diff --git a/datalad_catalog/workflow.py b/datalad_catalog/workflow.py
index e10011a..4828fdc 100644
--- a/datalad_catalog/workflow.py
+++ b/datalad_catalog/workflow.py
@@ -117,7 +117,7 @@ class Workflow(ValidatedInterface):
         ),
         mode=Parameter(
             # cmdline argument definitions, incl aliases
-            args=("-t", "--type"),
+            args=("-m", "--mode"),
             # documentation
             doc="""Which type of workflow to run: one of ['new', 'update']""",
         ),
@@ -244,7 +244,7 @@ class Workflow(ValidatedInterface):
         )
         if mode == "new":
             yield from super_workflow(
-                ds=dataset,
+                ds=dataset.ds,
                 cat=catalog,
                 extractors=extractor,
                 config_file=config_file,
@@ -253,8 +253,8 @@ class Workflow(ValidatedInterface):
             )
         if mode == "update":
             yield from update_workflow(
-                superds=dataset,
-                subds=subdataset,
+                superds=dataset.ds,
+                subds=subdataset.ds,
                 catalog=catalog,
                 extractors=extractor,
                 **res_kwargs,

If you can download and apply that locally with git apply workflow_patch.patch and let me know if this fixes things for you, that would be great.

bpinsard commented 5 months ago

thanks! I will try that asap and let you know.

bpinsard commented 5 months ago

It passes the error I had before. However I get 404 on the generated webpage, not sure what I am doing wrong. If I then add the different datasets to the catalog it shows in the web interface.

jsheunis commented 5 months ago

It passes the error I had before

Good to hear! I will proceed with a PR to bring this change into main.

However I get 404 on the generated webpage, not sure what I am doing wrong.

With the "generated webpage" do you mean a locally served webpage using datalad catalog-serve on the location of the generated catalog?

If I then add the different datasets to the catalog it shows in the web interface.

So do you get a 404 when navigating to the root of the served location (e.g. http://localhost:8000) but the datasets render successfully when navigating to the specific ID and version (http://localhost:8000/dataset/dsID/dsVERSION)? If that is the case, it seems to me that the homepage of the catalog might not have been set correctly (or at all). Do you encounter all of this after running the exact command you posted in your first comment in this issue, or did you run any other commands?

jsheunis commented 3 months ago

@bpinsard did you manage to get this resolved or is the issue still a problem?