Open TimBMK opened 2 years ago
@TimBMK Thanks for reporting the bug. I can reproduce this.
require(academictwitteR)
#> Loading required package: academictwitteR
users <- c("303234771", "2821282972", "84803032", "154096311", "2615232002", "37776042", "2282315483", "405599246", "1060861584938057728", "85161049")
tempdir <- academictwitteR:::.gen_random_dir()
get_user_timeline(x = users,
start_tweets = "2017-04-01T00:00:00Z",
end_tweets = "2017-06-01T00:00:00Z",
n = 3200,
data_path = tempdir,
bind_tweets = FALSE,
verbose = FALSE)
#> data frame with 0 columns and 0 rows
list.files(tempdir)
#> [1] "data_.json" "data_848204306566320128.json"
#> [3] "data_848950153520218113.json" "query"
#> [5] "users_.json" "users_848204306566320128.json"
#> [7] "users_848950153520218113.json"
data <- bind_tweets(data_path = tempdir, output_format = "tidy")
#> Error in `stop_subscript()`:
#> ! Can't rename columns that don't exist.
#> ✖ Column `id` doesn't exist.
data_raw <- bind_tweets(data_path = tempdir, output_format = "raw")
Created on 2022-03-10 by the reprex package (v2.0.1)
There are actually two issues here:
get_user_timeline
shouldn't generate those empty json files in the first place.bind_tweets
can't handle those empty json files.@TimBMK I will keep this issue focusing on only the second issue. And I will open another issue related to the first one.
Hello,
This worked for me: batch_four <- bind_tweets('data' user = FALSE, verbose = TRUE, output_format = "raw")
but when trying to convert to csv with: write.csv(batch_four, 'batch_4.csv')
I get this error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 538, 519, 575, 392, 190, 1, 282, 603, 111
@psalmuel19 this is unrelated to the issue mentioned above, as it is clearly caused by write.csv() rather than the bind_tweets() function. I suspect the nested lists in the raw data format to cause problems. Try unnesting batch_four or use output_format = "tidy" when binding the tweets. If the issue persists, please open a seperate issue.
@TimBMK
I should have mentioned that did that and got the error below:
batch_four <- bind_tweets('data', user = FALSE, verbose = TRUE, output_format = "tidy")
Error in chr_as_locations()
:
! Can't rename columns that don't exist.
✖ Column id
doesn't exist.
While searching for a solution, I came across the output_format = "raw" code. It worked in binding but I now can't convert to csv. Any suggestions please?
As mentioned in the original post, the easiest fix to get the tidy format to work is to go into the folder with the data and manually delete the empty "data_.json" files. This fixes the issue with the tidy format, as the issue with the non-existent id column does not come up.
The raw format does not output a dataframe, but a list of tibbles (a type of dataframe) of different length containing different information (this is what the API returns originally). If you are set on using the raw format, you will have to decide what information you want to export to .csv. If you look at the structure of the raw data object (batch_four in your case), it is relatively self-explanatory what you get in each of the tibbles. An easy way to do this yourself is with
names(batch_four)
In order to export the data, you can write the tibbles by referencing them explicitly, e.g.
write.csv(batch_four$tweet.main, file = "batch_4.csv")
tweet.main contains the main information of the tweet; additional information (e.g. metrics) would need to be matched together. You can use dplyr's left_join() function for this and use the tweet_id as an indicator for matching. As I mentioned above, however, removing the problematic files by hand will enable the tidy format, which gives you all relevant data in a neat and ready-made format.
Please confirm the following
something went wrong. Status code: 400.
Describe the bug
As soon as there is a .json file in the datapath of bindtweets without an ID ("data.json"), the function fails with an error if set to the "tidy" format. Generating the "raw" format, however, is not an issue. The following error occures:
The data_.json is usually an empty file, but it seems to get generated whenever native academictwitteR functions do not return any twitter data (empty pages). The last three times I used get_usertimeline(), I ended up with these empty files. Deleting the data.json file fixes the error. Furthermore, I believe the problem only started occuring after I updated academictwitteR to 0.3.1. I don't think it occured under 0.2.1.
Expected Behavior
I would suggest some sort of failsafe that automatically skips .json files without the ID, as they seem to be empty anyways.
Steps To Reproduce
Environment
Anything else?
Possibly related to #218