Open havardl opened 2 years ago
btw, this is how I preprocess the .csv file and build the colink network in python:
coord_net_tk.preprocess.preprocess_csv_files(db_name, [csv_filename_path])
coord_net_tk.compute_networks.compute_co_link_network(db_name, 10, min_edge_weight=5, resolved=False)
G = coord_net_tk.graph.load_networkx_graph(db_name, "co_link")
nx.write_graphml_lxml(G, "filename.graphml")
When I preprocess a .csv file which contains tweets without urls, I get more than 40 pairs of source/target combinations between profiles. But when I remove tweets without links in them, my network gets reduced to just a handful of profiles.
If your input is only tweets with no urls, there shouldn't be anything in the output co-link network, so something has gone wrong somewhere.
From a quick glance your preprocessing/data looks reasonable to me, but I'll take a closer look later when I have more time.
Few questions:
If your input is only tweets with no urls, there shouldn't be anything in the output co-link network, so something has gone wrong somewhere.
This is very helpful. I was wondering if the co-link network perhaps also looked at some other variables, but this makes me think I can just remove all the rows which have no urls before preprocessing the .csv file. That way I'm sure the network is only made up by link connections.
To your questions:
entities.urls.expanded_url
for a given tweet from the twitter API
I'm seeing a big difference between two outputted networks when I preprocess a .csv file with and without tweets which contains urls.
When I preprocess a .csv file which contains tweets without urls, I get more than 40 pairs of source/target combinations between profiles. But when I remove tweets without links in them, my network gets reduced to just a handful of profiles.
This makes me wonder if I am processing my data in a wrong way when I'm generating the .csv file? This is the current format of my csv file:
Is this the correct way of doing it?