Closed Narasimha1997 closed 4 years ago
I'm not sure about its usage. There's no usage of metadata in the code. We don't know what data will be saved and what its usage will be. The class should store information which it needs and store it in a way which is best for its usage. This for now I think is a too general approach for some ideas which is not still curated and implemented. Once the ideas are fixed and cleared, we will probably find a better and explicit approach for them.
Yes, you are right. But right now we can save/load only the learnt rules. If user wants to include any other miscellaneous information, there is no way they can save it. In other words, the saved file has no meaning unless it explains what it is, who created it and why it was created and what it can do. Every file format has this feature which contributes towards explainability You can provide this as an add-on, as it won't affect any core features. Users can save any information they wish. For example, some users would like to save URLs from which was used for scraping. Some might like to add description etc.
For a user who uses the rules created by others, this information would serve useful as he can understand what the rules are for.
It's just my opinion. You take a call
I understand your point. But this doesn't have any structure. For example If anybody uses his own structure for adding author or description etc, how can you use it in a proper way?
Got it ! So what fields would you like to include? As of now ??
I don't know, if this PR will be approved, but just in case: in this line (62)
metadata_to_save = metadata if (metadata and metadata != {}) else self._metadata
you can shorten if (metadata and metadata != {})
to if metadata
since empty dict will evaluate to False anyway
In the latest commit I have fixed the basic structure of metadata, these include : author
, author_email
, model_name
, description
, target_urls
, keywords
. We can extend this info in future, if there is a requirement. These are the basic fields that any saved model would except.
As we don't have an actual usage for now, I prefer to postpone it. Because it is subject to change and change will have high cost in future as some people may have used it already. For example think about when we are actually implementing it when we need it (like in the cool hub idea). We may conclude that it would have been better to use author
info as dict {'email': .., 'name':...}
for working with APIs instead of this. (It's just an example). But we can't change with ease for backward compatibility.
I agree to the need of this info in future, but it's better to approach it when we have completely worked it out and know what we need and why we need it. :)
Sure! We can. No issues. Got your point, let's postpone this.
This new PR allows users to add metadata dictionary and save/load it. Since metadata is a generic dict, users are free to add any kind of metadata. Some examples include - Author, license, description etc. This provides an identity to the learnt rules. (would be useful for those who publish their work)
set_metadata()
andget_metadata()
to bring in these features.load()
andsave()
Metadata field would be useful, we can save any sort of information along with the rules. In future if you try to add any other fields to the saved representation you can include them in metadata field, without making any major change to the codebase.