Closed mariomeissner closed 2 years ago
For documentation, you could probably write something new on this page by writing something in the nlp folder
Arguably, there would be a separate page for this; but I don't have the credentials to approve/disprove that type of change.
So I mentioned it inside "Custom Data Files" for now, but I agree that it's not really the adequate home for it (this is not about custom files). For now I guess it suffices!
Merging #214 (1fa0133) into master (4efda7b) will decrease coverage by
0%
. The diff coverage is42%
.
@@ Coverage Diff @@
## master #214 +/- ##
=====================================
- Coverage 86% 86% -0%
=====================================
Files 70 70
Lines 1549 1557 +8
=====================================
+ Hits 1337 1340 +3
- Misses 212 217 +5
In the documentation, I would add that this works for test and train files too. Something like:
Additionally, this works for test and train files as well:
++dataset.cfg.train_subset_name=name_of_subset
++dataset.cfg.test_subset_name=name_of_subset
Besides that, I don't see anything else to add.
Sorry I missed your last comment @mathemusician. I now added what you suggested. It seems the CI is failing, but for something unrelated? Let me know if there's anything further to do.
@mariomeissner Just saw this. @Borda I can fix the CI issue, it's just an import problem. @clementpoiret changed _TORCH_MAX_VERSION_1_8_1
to _TORCH_MAX_VERSION_SPARSEML
Thank you for this :)
This PR addresses issue, closes #213
I modify the
load_dataset
function incore/nlp/data.py
to find and use special subset names if provided, then rename them back to the standard names to avoid failures further along.Cannot be done in
[subset]_dataloader
function as initially proposed, because other functions such as_select_samples
are called before that (and would not work well).Two main reasons for patching
core/nlp/data/py
instead ofcore/data.py
:load_dataset
is not present in the former. Could create a function there and then let the function in the latter call that first...It's up to debate if this should really be moved to
core/data.py
instead.This PR should likely also include documentation changes, but I didn't check yet how to do that. Pointers appreciated.
I tested the following commands successfully: