We need to add a new data model table for ContentSource and update the Content model to handle different source types with their own specific extraction logic.
New ContentSource model
Add the following new model:
class ContentSourceType(enum.Enum):
URL = "url"
PATH = "path"
HUGGING_FACE = "hugging_face"
class ContentSource(Base):
__tablename__ = "content_sources"
id = Column(Integer, primary_key=True, index=True)
content_id = Column(Integer, ForeignKey('contents.id'))
type = Column(Enum(ContentSourceType))
value = Column(String) # URL, local path, or Hugging Face dataset reference
metadata = Column(JSON, nullable=True) #additional source-specific data
content = relationship("Content", back_populates="sources")
Tasks
[ ] Add ContentSourceType enum
[ ] Add ContentSource model
[ ] Update Content model to remove url column and add sources relationship
[ ] Update database migration scripts
[ ] Update any relevant API endpoints or services that interact with Content model
[ ] Add tests to reflect the new model structure
[ ] Update documentation for datamodel
Rationale
This change allows for handling different source types (URL, local path, Hugging Face dataset) that have their own specific extraction logic. It provides more flexibility and better organization for managing content sources.
Description
We need to add a new data model table for ContentSource and update the Content model to handle different source types with their own specific extraction logic.
New ContentSource model
Add the following new model:
Tasks
url
column and addsources
relationshipRationale
This change allows for handling different source types (URL, local path, Hugging Face dataset) that have their own specific extraction logic. It provides more flexibility and better organization for managing content sources.