Closed stefan-it closed 3 months ago
Pinging @MaxDall for help :)
For now I came up with the following solution:
Thanks @stefan-it for pointing this out!
I think it would be good for Fundus to offer support for serializing articles. We'd need some helper methods to serialize/deserialize articles. JSON seems like a good fit since it is human-readable. @addie9800 what do you think?
I definitely agree, also since we are already using JSON to represent the parsed articles within our tests. @MaxDall has also already started working on a solution implementing it.
Question
Hi,
many thanks for releasing this great crawler! Particulary, the supported number of German publishers is amazing - I am planing to collect some articles for LM pretraining.
I opened this issue, because I couldn't find an example in the docs: what is the best and recommended way to export articles into e.g. a jsonl file? I could think of adding a
to_json
function to anArticle
object and then write it to a file :thinking:But it would be great if the documention could also cover exporting articles :)
Many thanks in advance!