Exporting to a directory such as corpora/ with bin/baleen export corpora results in an error like:
[Errno 2] No such file or directory: 'corpora/corpora/cooking/5b2d180b7af8b43e439b59b0.json'
This is a path expansion bug, as the second corpora/ in the path is not the desired behavior.
Resolution
The fix is straightforward. In version v0.3.3-85-g88d5d7c, line 211, remove self.root,.
So, for the block that reads:
for post, category in tqdm(self.posts(), total=Post.objects.count(), unit="docs"):
path = os.path.join(
self.root, catdir[category], "{}.{}".format(post.id, self.scheme)
)
the revision should be:
for post, category in tqdm(self.posts(), total=Post.objects.count(), unit="docs"):
path = os.path.join(
catdir[category], "{}.{}".format(post.id, self.scheme)
)
This change results in the desired behavior on export.
Thanks @agodbehere for the bug report and the clear solution! You're right, there was a duplication of self.root in catdir[category]; I've implemented the change you suggested.
Issue
Exporting to a directory such as
corpora/
withbin/baleen export corpora
results in an error like:[Errno 2] No such file or directory: 'corpora/corpora/cooking/5b2d180b7af8b43e439b59b0.json'
This is a path expansion bug, as the second
corpora/
in the path is not the desired behavior.Resolution
The fix is straightforward. In version
v0.3.3-85-g88d5d7c
, line 211, removeself.root,
.So, for the block that reads:
the revision should be:
This change results in the desired behavior on export.