Hi guys @DeepSeekPH , thanks so much for sharing such an excellent work. I note that Openwebmath uses a specialized pipeline to extract content from HTML instead of directing using the WET file from Common Crawl. I just wonder how you guys deal with this problem? Do you also follow openwebmath to process the html with a private diagram? sincerely wait for your feedback.
Hi guys @DeepSeekPH , thanks so much for sharing such an excellent work. I note that Openwebmath uses a specialized pipeline to extract content from HTML instead of directing using the WET file from Common Crawl. I just wonder how you guys deal with this problem? Do you also follow openwebmath to process the html with a private diagram? sincerely wait for your feedback.