commoncrawl / cc-index-server

Common Crawl Index Server
http://index.commoncrawl.org/
65 stars 18 forks source link

[PyWB2] Query param `fl` is ignored #8

Open sebastian-nagel opened 5 years ago

sebastian-nagel commented 5 years ago

The query parameter to select the result fields (fl) is ignored by PyWB 2.3.0. As visible in the code it has been renamed from fl to fields with a fall-back for the old param name. But it does not work and furthermore causes the output param to be ignored:

> curl 'http://index-pywb2.commoncrawl.org/CC-MAIN-2019-35-index?url=commoncrawl.org&matchType=domain&fields=url&output=text&limit=1'
http://commoncrawl.org/

> curl 'http://index-pywb2.commoncrawl.org/CC-MAIN-2019-35-index?url=commoncrawl.org&matchType=domain&fl=url&output=text&limit=1'
{"urlkey": "org,commoncrawl)/", "timestamp": "20190818052150", "charset": "UTF-8", "languages": "eng", "url": "http://commoncrawl.org/", "status": "200", "mime": "text/html", "filename": "crawl-data/CC-MAIN-2019-35/segments/1566027313617.6/warc/CC-MAIN-20190818042813-20190818064813-00014.warc.gz", "digest": "FM7M2JDBADOQIHKCSFKVTAML4FL2HPHT", "offset": "42695747", "mime-detected": "text/html", "length": "5413", "source": "CC-MAIN-2019-35/indexes/cluster.idx", "source-coll": "CC-MAIN-2019-35"}
sebastian-nagel commented 3 years ago

Note: the old index based on pywb 0.33.2 already recognizes the param fields.