Converted content field to include link text urls using markdown format, to include outlinks that were previously missed
The issue (https://github.com/bellingcat/cisticola/issues/61) with the Rumble channel info transformer is that old raw_channel_info scrapes don't include the id field (but more recent scrapes do). I think it makes sense to add the id field to the old raw_channel_info entries manually, then the transformer should work form then on.
Modified the insert_or_select method to handle channels with missing fields
allowed for selecting Channel instances with null-valued platform_id
performed session.flush() after inserting channel to avoid issue where autoincremented id field wasn't updated for Channel object.
Expanded transformer test functionality
Implemented tests for channel info scraping and transforming
Added 'source': 'researcher' key value pair to *_CHANNEL_KWARGS in conftest.py to specify the condition where(Channel.source=='researcher') in cisticola.scraper.base.ScraperController.
Fixed problems with channel info transformers:
Made improvements to Telegram post transformer
author_username
andurl
author_id
content
field to include link text urls using markdown format, to include outlinks that were previously missedThe issue (https://github.com/bellingcat/cisticola/issues/61) with the Rumble channel info transformer is that old raw_channel_info scrapes don't include the
id
field (but more recent scrapes do). I think it makes sense to add theid
field to the old raw_channel_info entries manually, then the transformer should work form then on.Modified the
insert_or_select
method to handle channels with missing fieldsplatform_id
session.flush()
after inserting channel to avoid issue where autoincrementedid
field wasn't updated for Channel object.Expanded transformer test functionality
'source': 'researcher'
key value pair to*_CHANNEL_KWARGS
in conftest.py to specify the conditionwhere(Channel.source=='researcher')
incisticola.scraper.base.ScraperController
.