chatnoir-eu / chatnoir-resiliparse

A robust web archive analytics toolkit
https://resiliparse.chatnoir.eu
Apache License 2.0
80 stars 11 forks source link

fatal error: html.h: No such file or directory #27

Closed davidtbo closed 6 months ago

davidtbo commented 1 year ago

Hello,

I'm trying to build Resiliparse 0.13.7 from source, and I'm getting this error. Can you tell me which library Resiliparse is expecting to get html.h from? I suspect I'm missing a dependency.

resiliparse/extract/html2text.cpp:869:10: fatal error: html.h: No such file or directory

include "html.h"

      ^~~~~~~~

Thanks, Dave

phoerious commented 1 year ago

html.h should be part of the source package. If that file is missing from the wheel somehow, try building from the repository instead.

davidtbo commented 1 year ago

I downloaded the 0.13.7 source zip file and that file IS there if that line is referring to resiliparse/parse/html.h. However, there's not an html.h file in resiliparse/extract/, and that's where this error is occurring.

At the top of that html2text.cpp file I see this:

/* Generated by Cython 0.29.32 */

/* BEGIN: Cython Metadata
{
    "distutils": {
        "depends": [
            "resiliparse/parse/html.h"
        ],
        "extra_compile_

Is this referring to an html.h that's supposed to be in the extract or the parse directory?

phoerious commented 1 year ago

I'm not quite sure why your compiler would look for it there. This is the relevant import:

https://github.com/chatnoir-eu/chatnoir-resiliparse/blob/develop/resiliparse/resiliparse/extract/html2text.pyx#L26

Please try building it from the Git repository instead. Perhaps the file wasn't included properly in the source wheel.