Closed mariosasko closed 3 years ago
Absolutely agreed. Removal of WIP modules is also something that can increase cleanliness, not only for English speaking users.
I propose removal of the following packages/modules:
Please note that I am not claiming that the packages/modules listed above need to be removed completely. If they are not hosted on any other platform, we can create a separate repo to hold them there.
One more thing. If we remove preproc/stemmer/* then we no longer need zip_safe=False
in setup.py because, by doing so, no files will be left that are holding data. Having zip_safe set to True is preferred whenever possible. This stemmer (python source) is already published online.
One more reason to remove preproc/yake.py. Rn, PyPI doesn't support links to online repos in the list of required packages in setup.py. The yake library relies on this, but we don't really need this component (it doesn't interact with podium at all).
Adding a few things here. Instead of complete removal since some downstream dependencies already depend on these libraries (and more will depend soon) let's try to move most of them a new respository (something like podium-models
).
I propose that the only the podium/metrics/metrics.py
is removed. The following modules I propose to move to the other (non-core repository):
podium/preproc/lemmatizer/
podium/preproc/stemmer/
podium/preproc/yake.py
podium/preproc/util.py
(used only by lemmatizer and stemmer)podium/dataload/eurovoc.py
podium/dataload/ner_croatian.py
podium/models/impl/blcc/chain_crf.py
podium/models/impl/blcc_model.py
podium/models/eurovoc_models/multilabel_svm.py
Migrating these modules to a separate repository should allow for also migrating classes SCPDownloader
(podium/storage/resources/downloader.py
) and SCPLargeResource
(podium/storage/resources/large_resource.py
) which should significantly simplify the downloading and large resource modules.
More or less done IMO with the recent private transfer & changes.
IMO, there are some modules/packages that don't add any value to the project or are of low interest for English speaking users. By removing them, the codebase gets cleaner and we no longer have to maintain such modulues/packages. If we decide to keep them, our project will not have a clear direction. Consequently, it could affect the number of users in the long run.
The removal of the module/package is a 5-step process:
__init__.py
in the parent directory and delete the related entries.setup.py
.I'll edit this post to discuss and stage the module/package removal.
Feel free to comment.
cc @mttk @FilipBolt @ivansmokovic