DocNow / twarc-csv

A plugin for twarc2 for converting tweet JSON into DataFrames and exporting to CSV.
MIT License
31 stars 10 forks source link

How to search for URLs? #31

Closed numeroteca closed 3 years ago

numeroteca commented 3 years ago

I am trying to search for two different URL (URL1 OR URL2), but I am not being able to make it work or to escape the characters. Is this the right method?

twarc2 search '(https://www.elconfidencial.com/espana/madrid/2021-09-07/universidad-periodismo-complutense-profesores_3218500, OR https://www.infolibre.es/noticias/opinion/columnas/2021/09/08/la_verdad_sobre_caso_quiros_una_cronica_primera_persona_124235_1023.html)' > search_210913.json

⚡ There were errors processing your request: no viable alternative at character '/' (at position 122), no viable alternative at character '/' (at position 8), no viable alternative at character '/' (at position 9), no viable alternative at character '/' (at position 123)

igorbrigadir commented 3 years ago

What OS / terminal are you using? I can't test right now but try double quotes and the url operator instead of single quotes, if that fails, try to search for unique fragments of the url, again with the url operator, avoiding - and maybe adding 2 operators like:

twarc2 search "((url:elconfidencial.com url:profesores_3218500) OR url:la_verdad_sobre_caso_quiros_una_cronica_primera_persona_124235_1023.html)" search_210913.json
numeroteca commented 3 years ago

I am using Ubuntu terminal.

The double quotes didn't solve the problem.

The command example you send seems to work,. Thanks!

edsu commented 3 years ago

Quoting and using the url search operator seems to work nicely?

twarc2 search 'url:"https://www.elconfidencial.com/espana/madrid/2021-09-07/universidad-periodismo-complutense-profesores_3218500" OR url:"https://www.infolibre.es/noticias/opinion/columnas/2021/09/08/la_verdad_sobre_caso_quiros_una_cronica_primera_persona_124235_1023.html"' 
igorbrigadir commented 3 years ago

Ah even better!