Open mashehu opened 1 month ago
@simleo is this a bug? I'm not familiar enough with the code, but I assumed that file entities would use the full file path as @id
.
@mashehu and I agreed on the WRROC call that this is a bug.
As a workaround, you should be able to use the dest_path
argument to manually set the intended file path within the crate, and therefore the @id
:
crate.add_file(fn, dest_path=fn, properties={"programmingLanguage": {"@id": "#nextflow"}})
Also, you might like to look at the add_workflow()
function which could save you a few lines elsewhere. Something like:
crate.add_workflow( # sets @type and conformsTo according to Workflow RO-Crate spec
fn,
dest_path=fn,
main=True, # sets the added workflow as main entity
properties={ . . . },
lang="nextflow" # adds the #nextflow entity automatically and connects it to programmingLanguage
lang_version="X.Y.Z", # sets version on #nextflow
)
Thanks @elichad, specifying dest_path
works.
add_workflow()
is indeed helpful, is it documented somewhere?
add_workflow()
is not documented yet - I myself only learned about it very recently, and raised #186 to flag that we need to fix that gap in the documentation
mashehu and I agreed on the WRROC call that this is a bug.
As a workaround, you should be able to use the
dest_path
argument to manually set the intended file path within the crate, and therefore the@id
It's not a bug, and using dest_path
is not a workaround, but the normal way to set the data entity's path / id within the crate, which is always relative to the crate's root directory (which is not known until the crate is written).
When dest_path
is not specified, the default behavior is to place the data entity (file or directory) at the top level, thus setting the id to the basename. Using the source path as the destination path does not work in general:
dest_path
the method throws a ValueError
)fetch_remote
set to True
(in this case the file is downloaded and added to the crate as a local file)StringIO
or BytesIO
(this is supported), in which case there is no source pathIf you want to check exactly what happens, take a look at FileOrDir
's __init__
method and File
's write
method
A usage example can be found in repo2rocrate.
Usage of dest_path
is demonstrated at the beginning of the README, in Creating an RO-Crate. I will review the documentation to see if I can better clarify how things work.
Hi,
I am currently working on adding RO-crates to the nf-core CLI tools: https://github.com/nf-core/tools/pull/2680
nf-core pipelines usually have a root
main.nf
file, which should be aComputationalWorkflow
entity, but they also have modules with their own nestedmain.nf
e.g.modules/local/cat_additional_fasta/main.nf
. Trying to add all the different nested files usingcrate.add_file(fn, properties={"programmingLanguage": {"@id": "#nextflow"}})
wherefn
is the file path, e.g.,modules/local/cat_additional_fasta/main.nf
results in only onemain.nf
entry. How do I get the full path as the entry id to avoid overwriting duplicates?Additionally, a warning that an
@id
already exists before overwritting it, would be helpful.