dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

1 out of 296 paths are not unique. We will try adding _obj- based on crc32 of object_id #1489

Closed nrsc closed 2 months ago

nrsc commented 3 months ago

Hello all,

Running into an issue while trying to organize my dandi set using the cli. Getting [ INFO] 1 out of 296 paths are not unique. We will try adding _obj- based on crc32 of object_id, and I can't seem to locate any more information about the non-unique path in question. I've been checking the logs but there are no further details on which path may be not unique.

Build info as follows.

[INFO    ] dandi 647577:124894109286400 dandi v0.63.0+5.g37b63509, hdmf v3.14.3, pynwb v2.8.1, h5py v3.11.0

I am using the dandi v0.63.0+5.g37b63509 build to address metadata problem I was facing previously, but this issue comes up whether I am using the +5.g37...09 build or not. I have added more files to the dataset, so I assume that somewhere along the way I added a file that trips the not unique issue. Unfortunately I do not know where to look to identify where the issue stems from, as the "non-unique path" is not written into the log.

Another error that shows up: Error: 'numpy.bytes_' object has no attribute 'encode'

I will attach the log as well. 2024.08.19-16.50.59Z-647577.log

Other details regarding build information

/ 2024-08-19 09:55:08,798 [ INFO] Loading metadata from 296 files [Parallel(n_jobs=-1)]: Using backend LokyBackend with 20 concurrent workers. /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.1.3 because version 1.8.0 is already loaded. return func(args[0], **pargs) /home/nrsc/.local/share/r-miniconda/envs/dandi/lib/python3.12/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.2.4 because version 2.7.0 is already loaded. return func(args[0], **pargs)
kabilar commented 3 months ago

Thanks for the report, @nrsc. Could you post a list of the files (with the full path) you are attempting to organize (e.g. using the tree command)?

nrsc commented 3 months ago

Here's the output of all the files in the folder. dandi_output.txt

yarikoptic commented 3 months ago

Hi @nrsc . Since you are a bug magnet (like yours truly), you might want to learn about an option to fall into python debugger which could potentially come out of help to troubleshoot more in the future

❯ dandi --help | grep pdb
  --pdb                           Fall into pdb if errors out

which if you specify (e.g. dandi --pdb organize ...) would lead you to drop into pdb debugger at that point of error on numpy.bytes_ which I fail to reproduce ATM. More on how to use pdb e.g. at https://realpython.com/python-debugging-pdb/ .

I will look now into providing more information about those non-unique paths - we must be able to provide more informative message there!

yarikoptic commented 3 months ago

@nrsc try out

nrsc commented 3 months ago

Thank you @yarikoptic. Will confirm effects of updates once I get the chance to sit back down with this again. Cheers all.

nrsc commented 2 months ago

Hi @yarikoptic. Here is the output from pdb

``` (dandi-cli) nrsc@ai-connect:~/001065$ dandi --pdb organize 2024-08-26 11:00:56,664 [ INFO] Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-08-26 11:00:56,664 [ INFO] NumExpr defaulting to 8 threads. 2024-08-26 11:00:57,209 [ INFO] Logs saved in /home/nrsc/.cache/dandi-cli/log/2024.08.26-18.00.56Z-784710.log Traceback (most recent call last): File "/home/nrsc/.local/bin/dandi", line 8, in sys.exit(main()) File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/nrsc/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/nrsc/.local/lib/python3.10/site-packages/click/decorators.py", line 45, in new_func return f(get_current_context().obj, *args, **kwargs) File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/cli/base.py", line 126, in wrapper return f(*args, **kwargs) File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/cli/cmd_organize.py", line 112, in organize organize( File "/home/nrsc/.local/lib/python3.10/site-packages/dandi/organize.py", line 822, in organize raise ValueError( ValueError: Only 'dry' or 'move' mode could be used to operate in-place within a dandiset (no paths were provided) > /home/nrsc/.local/lib/python3.10/site-packages/dandi/organize.py(822)organize() -> raise ValueError( (Pdb) ```

I'm wondering whether the line;

2024-08-26T11:00:57-0700 [DEBUG   ] dandi 784710:134806063919104 Caught exception Only 'dry' or 'move' mode could be used to operate in-place within a dandiset (no paths were provided)

can point to the origin of the error.

Unfortunately I am not so well versed in python and python debugging. I've been an R guy for a while now, but I am interested in contributing best I can and learning about this process.

Should I pull the organize.py file that you pushed last week and try running the organize function again?

Cheers,

Scott

nrsc commented 2 months ago

Hi @yarikoptic. Updated from the repository, and I now get the paths out when the error identifies the duplicated paths. That helped me identify and fix the issue. Thank you for providing the patch.

yarikoptic commented 2 months ago

sorry I have missed your prior comment and thanks for reporting back - bring us joy to have issues closed! ;)