Closed PiyushKyushu closed 2 years ago
Hi :) did you used the get_urls_from_ct_histdata to extract the URLs from CrowdTangle's CSV list of posts? If so, you should also have a number of URLs that you started from. CooRnet collects all the shares of these URLs and stores it in the ct_shares.df. Unlike your original CSV, this is a list of posts that shared your URLs on the entire platform tracked by CrwodTangle. The ct_shares_marked is created by get_coord_shares as part of its outputs. The ct_shares_marked dataframe includes two additional field (is_coordinated and is_orig) and only includes the posts related to links that were shared at least two times (this is the reason ct_shares_marked is smaller than ct_shares.
Hope I've answered your questions.
Best, Fabio
Hi Fabio,
Thank you for the explanation. It is very helpful and the answer certainly enhance my understanding about ctshares and ct_shares_marked. The only thing I am still confused about is url.
I collected Facebook posts from Crowdtangle (my_data_original.csv) = 14,656
Then I used below code:
urls <- get_urls_from_ct_histdata(ct_histdata_csv=urls <- get_urls_from_ct_histdata(ct_histdata_csv="my_data_original.csv")
This produced a list of URLs with date = 7448
did you used the get_urls_from_ct_histdata to extract the URLs from CrowdTangle's CSV list of posts? If so, you should also have a number of URLs that you started from.
As you mentioned, this means that the URLs which I got (7448) came from my dataset (my_data_original.csv).
From where this number came? what are these URLs? I mean if I inspect my dataset manually where (under which field) I can found them?
In other words from which field CooRnet extract URLs? Is it URL or Link or any other field?
I apologise if my question seems naive or insignificant.
Thank you for the cooperation.
It's actually a good question on a poorly documented part of the package. The get_urls_from_ct_histdata attempts extracting all the links from the CrowdTangle CSV of posts. The process is performed in lines 71 to 76 of the function code (https://github.com/fabiogiglietto/CooRnet/blob/master/R/get_urls_from_ct_histdata.R).
In other terms: 1) Starts from Final Link 2) If Final Link is empty gets what available in Link 3) If the post is marked as a re-share in Link Text, the Link field is used 4) If the post is not a Link Type Post, the function attempts extracting the link if any from the Message and Description field
Please also note that CrowdTangle API link endpoint may return a post that include multiple links. This means that you may end up with link referenced in ct_shares.df that are not in your original list. To avoid using these links use set to TRUE the is_orig parameter in get_coord_shares.
Best, Fabio
Thank you so much for the explanation Fabio.
Hi,
I have a dataset of 13622 rows and 41 columns which I collected from Crowdtangle historical data option. I used this dataset for finding out Coordinated link sharing behaviour using CooRnet.
The output include ctshares with 98,731 rows and 35 columns and ct_shares_marked with 13109 rows and 37 columns.
I don't understand how and why these three dataset (my original dataset, ctshares, and ct_shares_marked) are different from each other.
I find one already closed issue https://github.com/fabiogiglietto/CooRnet/issues/22 that is about ctshares and ct_shares_marked but I don't understand what is been said there.
It would be very helpful if you can explain the differences between three dataset.
Thank you in Advance!