Closed david-sitsky closed 7 months ago
For a workaround, you can modify the metadata.json and remove the properties. You may also need to change file references from relative to absolute. Loading this would avoid the long path problems.
In terms of the larger usage of properties inside model paths, the main reason we do so is to differentiate different artifacts within the same metadata. Properties make it work with a fairly clean directory tree for most cases. This also only applies for the local cache of downloaded models and datasets. The code for it is in Artifact.getResourceUri()
if you are interested.
For a complete solution, we would need to replace it with a new system. The obvious one is to use the artifact name instead of the properties. This requires that all artifacts have names and that they are unique within a metadata.json file. I am not sure off the top of my head that this is true, so we would have to verify. Assuming that is fine, it should be a fairly easy change. @frankfliu what do you think?
the properties are generated by model importing tool, we need to update the tool to combine languages into a single property.
We can manually change the metadata.json for now.
I did a scan of existing model zoo, all models (except mxnet yolo) artifact name + version are unique in metadata.json.
We actually don't need to use properties as file path to avoid file path clash.
@frankfliu - many thanks for fixing this. I can confirm on Windows this now works as expected using 0.27.0-SNAPSHOT.
Description
While I work on Linux, I have to write software that also works on Windows. The multilingual-e5-small model has a large number of properties, which are the number of languages it supports. You can see this when deploying this model on Linux:
Expected Behavior
That the model can be loaded on Windows.
Error Message
How to Reproduce?
Open the above model on a Windows machine.
Thoughts
Why are model properties being represented as sub-directories? This seems to be an expensive way to do so, when a properties file would take less filesystem resources? Also various other filesystems have limits which are likely to be hit by this way of representing things.
Can this be easily changed, or is this a more involved change? I'm happy to have a look if some guidance can be provided.