dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
402 stars 43 forks source link

Readme HTML has broken internal links #58

Closed simonw closed 3 years ago

simonw commented 3 years ago

From https://github.com/simonw/datasette.io/issues/46

<li><a href="#filtering-tables">Filtering tables</a></li>
...
<h3><a id="user-content-filtering-tables" class="anchor" aria-hidden="true" href="#filtering-tables"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a>Filtering tables</h3>

So this is a bug in GitHub's API, but we need to work around it.

simonw commented 3 years ago

I'm going to rewrite those <a href="#filtering-tables"> links to <a href="#user-content-filtering-tables"> - but only if a corresponding id="user-content-filtering-tables" element exists.

simonw commented 3 years ago

I don't want to add a full HTML parser (like BeautifulSoup) as a dependency for this feature. Since the HTML comes from a single, trusted source (GitHub) I could probably handle this using regular expressions.