Open dieko95 opened 3 years ago
From the first PoC with El Pitazo, looks like we can get: title, content, date, author, categories and tags. It would be good to explore on our next sources whether or not they can be extracted also.
In the mean time, for a VP we are counting on just the content of the post. In case of a change on design, it will be notified here.
From the first PoC with El Pitazo, looks like we can get: title, content, date, author, categories and tags. It would be good to explore on our next sources whether or not they can be extracted also.
In the mean time, for a VP we are counting on just the content of the post. In case of a change on design, it will be notified here.
@marianelamin Gotcha! Thanks a lot for the update 🙌
Problem
We currently haven't defined the flattened dataset's schema that will be consumed by the
huggingface
transformer.Proposed Solution
Define the training dataset schema that will be used to train the
huggingface
transformer.text
,news_title
,location
,issue
,source_type
,author
, etc...varchar
,int
,float
, etc..)Deliverable
readme.md
with dataset's schema.