benmarwick / JSTORr

Simple text mining of journal articles from JSTOR's Data for Research service
Other
71 stars 18 forks source link

1-gram error: "object type 'closure' is not subsettable" #30

Closed cwatl closed 7 years ago

cwatl commented 8 years ago

Hi -

I'm having the following error on a dataset I downloaded from JSTOR DFR today:

unpack1grams <- JSTOR_unpack1grams(path = "C:\Users\Charles\Desktop...\phil terms batch") reading 1-grams into R...

done reshaping the 1-grams into a document term matrix... Loading required package: NLP

Attaching package: ‘NLP’

The following object is masked from ‘package:ggplot2’:

annotate

done arranging bibliographic data... Error in cit$id : object of type 'closure' is not subsettable

I am a novice, so it is possible I have caused this problem by breaking something with my R configuration once upon a time. But I have cleared out all objects, started a new project, used packrat, etc. to try to solve it, and it continues happening. I did not do anything to the zip file or its contents other than decompress it.

Does anything immediately jump out to you? Any additional information I can provide?

Thank you, Charles

benmarwick commented 8 years ago

Can you share your dataset with me? JSTOR change the DFR data format occasionally, and that can cause problems with this package. If you could put it on Dropbox or Google drive or a similar file sharing service then I'll take a look.

cwatl commented 8 years ago

Hi Ben - yes thank you. Here's a link to the zip: https://drive.google.com/open?id=0Bz646prBxcB3cnRsOEgyaEtEb0U

I had trouble with this one as well: https://drive.google.com/open?id=0Bz646prBxcB3RHJIVUxFOWtvWkk

histmr commented 8 years ago

I have a similar error

JSTOR_unpack2grams() reading 2-grams into R... |======================================================================================================================| 100% done reshaping the 2-grams into a document term matrix... |======================================================================================================================| 100% done Error in cit$id : object of type 'closure' is not subsettable

cwatl commented 8 years ago

I wound up forking the project in order to tinker with it, but again, I'm no programmer. This bug seems to the be the result of the function that determines whether this is the old data format or the new data format. The variable is assigned the function rather than the output of the function. I think adding parentheses in the right place would fix it, but because I knew what version I was using, I just deleted the logic. This resolved this error.

After that, though, I wound up with another couple errors which I don't now recall, although I think one was a RegEx error due to escape characters or something. I was able to work through some of them but still got stuck and ultimately ran out of time for the project I was working on. I can reproduce and document if you'd like, just let me know.

benmarwick commented 8 years ago

Thanks, yes, if you can show me how you solved the problem that will make it much quicker for me to update the package, and be a great help for other users too.

For the other errors, please start issues for them so i can have look in more detail (and other users can see what to expect)

I should mention that I'm away from my desk in a low-bandwidth location the next month, so there may be some delays in updating the package.

brooksambrose commented 7 years ago

The problem is at line 92 of the JSTOR_unpack1grams function. You want to return a cit object but accidentally copy the read_citations function object, so cit is a function instead of a list, hence Error in cit$id : object of type 'closure' is not subsettable. I'm not sure what the argument to read_citations is supposed to be or I would have submitted a pull request.

  cit <- read_citations
  library(stringr)
  cit$id <- str_extract(chartr("/", "_", cit$id), ".*[^\t]")
benmarwick commented 7 years ago

Thanks @brooksambrose for reminding me about this! Your diagnostic is helpful. I can reproduce the problem, and I'm looking into a fix.

benmarwick commented 7 years ago

@cwatl I have tested your data with the current version, following the fix that @brooksambrose contributed, and it reads in fine. Let me know how you go!