There are thousands of Tamil blogs. Most are not active. This provides a rich avenue to generate a dataset as well as to preserve the blog content.
Loop through each publicly available blog and get a json representation for each post (including the metadata). Post process to convert this into a large csv!
Would be good if associated media files within that domain can be download, but this is optional.
There are thousands of Tamil blogs. Most are not active. This provides a rich avenue to generate a dataset as well as to preserve the blog content.
Loop through each publicly available blog and get a json representation for each post (including the metadata). Post process to convert this into a large csv!
Would be good if associated media files within that domain can be download, but this is optional.
http://tamilpoint.blogspot.com/p/tamil-blogs.html