ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Nested Boolean searches #15

Closed chartgerink closed 9 years ago

chartgerink commented 9 years ago

Hi,

I am trying to do nested Boolean searches, but I immediately receive an error that an operator was unexpected. More specifically, it is the operator that combines the two Boolean searches that proves to create an error. These kind of searches do work in EuropePMC directly, btw.

The search I used is getpapers --query '(TITLE: "QRP" AND TITLE:"misconduct") OR (PUB_TYPE:"retraction of publication")' --outdir test, see also the attached image.

If I remove the Boolean part from OR onward I also receive an error, so it seems that the parentheses might also be the cause of the problem. Any help on why this creates an error and whether this is solvable?

Kind regards, Chris Hartgerink nested boolean

blahah commented 9 years ago

The query you quoted above doesn't work for me because of the space between TITLE: and "QRP". However, in the screenshot you don't have the space, so I guess that's not the problem.

If I run the command without the space it works for me

$ getpapers --query '(TITLE:"QRP" AND TITLE:"misconduct") OR (PUB_TYPE:"retraction of publication")' --outdir test
info: Found 1027600 open access results;

I killed it at that point because I don't want to download 1 million papers :)

This could be a Windows-specific problem. It might be something to do with the way Windows CMD is parsing the command - it might be interpreting the brackets. Could you try:

getpapers --query '\(TITLE:"QRP" AND TITLE:"misconduct"\) OR \(PUB_TYPE:"retraction of publication"\)' --outdir test

Unfortunately I don't have a Windows machine to test on.

chartgerink commented 9 years ago

I tried as recommended and it does not work, because I get the same error. It seems like a windows specific error. I tried running just getpapers –query ‘(“QRP”)’ –outdir test but that gives me an error (also when including the escape \). I also tried running it in PowerShell, but equivalent results.

Should the OS make a difference outside of the parsing? (I figure not, but I am not familiar with node.js nor am I that experienced in programming, just scripting)

blahah commented 9 years ago

The OS should make no difference to the running of the code. Although I haven't tested on Windows, I have used cross-platform code throughout (I think).

According to this guide to Windows shell escaping, brackets are safe only inside double-quotes. So maybe you need to put double-quotes around the whole command, instead of singles:

getpapers --query "(TITLE:\"QRP\" AND TITLE:\"misconduct\") OR (PUB_TYPE:\"retraction of publication\")" --outdir test

or the easier to type:

getpapers --query "(TITLE:'QRP' AND TITLE:'misconduct') OR (PUB_TYPE:'retraction of publication')" --outdir test
chartgerink commented 9 years ago

Oh yes! This works :) Thank you.

blahah commented 9 years ago

No problem - which version worked? Or did both?

I will add a note to the README so future Windows users don't get caught.

chartgerink commented 9 years ago

Both worked. My apologies for not specifying.

blahah commented 9 years ago

Great, thanks!