bio-guoda / preston

a biodiversity dataset tracker
MIT License
25 stars 1 forks source link

content retrieved by alias does not match aliased content #248

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago

to reproduce:

  1. track content preston track https://duckduckgo.com preston track https://bing.com

  2. retrieve content by alias preston cat https://bing.com | grep duckduckgo.com

expected results:

no hits

actual results:

$ preston cat https://bing.com | grep duckduckgo
<link rel="canonical" href="https://duckduckgo.com/">
<meta name="apple-itunes-app" content="app-id=663592361, app-argument=https://duckduckgo.com/?smartbanner=1">
<meta name="twitter:site" value="@duckduckgo">
<meta property="og:url" content="https://duckduckgo.com/" />
<meta property="og:image" content="https://duckduckgo.com/assets/logo_social-media.png">
<script type="text/javascript" src="/locale/en_US/duckduckgo14.js" onerror="handleScriptError(this)"></script>
                            <form id="search_form_homepage" class="search  search--home  js-search-form" name="x" method="POST" action="https://html.duckduckgo.com/html">

whereas

$ preston alias https://bing.com 
<https://bing.com> <http://purl.org/pav/hasVersion> <hash://sha256/b02ef6fec9f6683135078e2fc93d3222f75bc781a32d7a204045dc4a0cbf2010> <urn:uuid:78b5e8fb-ebd9-4ab8-a07d-2d064622e015> .

and

$ preston alias https://bing.com\
 | preston cat\
 | grep duckduckgo\
 | wc -l
0

so, retrieval by alias produces incorrect results, whereas retrieval by content hash produces expected results.

jhpoelen commented 1 year ago

Fixed in v0.6.4