Targoman / PersianWebScraper

An accurate scrapper to scrape popular persian websites, mostly intended to be used as a tool to create large corpora for Persian language.
GNU Lesser General Public License v3.0
28 stars 10 forks source link

Does python scripts acceptable? #122

Open mnwato opened 4 months ago

mnwato commented 4 months ago

According to current scrapper, your main programming language is TypeScript. Is it possible to develop the crawler/scrapper by other languages such as Python?

ziabary commented 4 months ago

It is hard to reimplement it by python. Why do you need to reimplement it?

mnwato commented 4 months ago

No, I didn't mean rewriting in another language. According to the contribution requirements published in the link bellow, I was thinking about creating specific scrapper. That's why I asked. https://oss.targoman.ir/TLPC

ziabary commented 4 months ago

I think you can better contribute by writing Python scripts to preprocess jsonl.gz files in order to ease training process