Closed bkrdmr closed 2 years ago
so reply_id and urls columns are empty
Are the columns present, but empty (ie, they have comma delimiters but empty strings)? The error looks like the columns aren't present in the file, or have been messed up somehow by the import code.
Out of interest - are you converting JSON data collected via the YouTube API into a CSV and using that? If you can share the code doing the JSON -> CSV conversion (just a gist or something) I might be able to add native support for the format, similar to the Twitter format.
Columns are present. I've tried with dummy values but got the same result. Data is stored in regular dbs of my lab. I extracted it as csv files, re-ordered the columns in pandas per your guideline, before saving it to a new csv for preprocessing.
df = df[['comment_id', 'commenter_id', 'commenter_name', 'video_id', 'reply_to', 'comment_displayed', 'published_date']]
df['urls'] = ""
df['reply_to'] = ""
df['published_date'] = pd.to_datetime(df['published_date'])
df['published_date'] = (df['published_date'] - pd.Timestamp("1970-01-01 00:00:00+00:00")) // pd.Timedelta('1s')
df.to_csv('comments.csv', index=False, encoding='utf-8')
Thanks for confirming - I'll try and take a look at what's going on today or tomorrow.
Thank you! will check again.
I had a quick look into this - I wonder if the problem is the CSV file is being misinterpreted within the toolkit?
I think two things to try are:
df.head().to_csv('comments.csv', index=False, encoding='utf-8')
df.to_csv('comments.csv', index=False, encoding='utf-8', quoting=1)
If either of those don't help, I might ask you to share an example file with me so I can debug for you.
Alternatively, since you're already writing Python, you can cut out the CSV middle man and work directly from the dataframe via the toolkit as a Python library. These functions are safe to use and aren't expected to change, I just haven't had any time to write documentation apart from the snippet in the readme.
from coordination_network_toolkit.preprocess import preprocess_data
# Create a generator of pandas rows, since iterrows returns an index and the row content
rows = (row for (i, row) in df.iterrows())
preprocess_data('youtube_comments.db', rows)
Yes. It is now working. Using preprocess_data() solved the issue. I guess something was wrong in the csv. I wonder why you chose directed graphs instead of undirected graphs for co-retweet behavior though.
Thank you for the prompt response and quick fix. This is definitely helpful.
Hello, I have been experimenting with data from different social media platforms. I am following your guidelines in this repo. lately, I've tried processing youtube comments. so reply_id and urls columns are empty. I am seeing the following ValueError in preprocessing phase. do you have any suggestions to overcome this?