florianmarkusse / html-parser

Lenient HTML-parser written in C.
GNU Lesser General Public License v2.1
19 stars 1 forks source link

Missing file in main branch #48

Open mareb opened 1 week ago

mareb commented 1 week ago

If I clone the project with git clone https://github.com/florianmarkusse/html-parser.git I cannot compile the project, because "array.h" and some other file are missing. With git checkout tags/v2.0.0 I get the older version that works.

florianmarkusse commented 1 week ago

Hi there! I have currently mothballed this project since I’m trying to focus on building my operating system.

this project’s goal is to be able to parse most html5 compliant files, i.e., mostly supporting standard XML markup, and not bothering with the older html standards.

Is there any way which I can help you with what you are trying to accomplish?

mareb commented 1 week ago

Thank you for your answer! Please do a clone in a new directory "git clone https://github.com/florianmarkusse/html-parser.git" and start a build. You see the missing files from compiler errors. It's not easy to understand your library – have you written any documentation that I haven't found? Or do you have a more extensive example of how it's used? (I saw the demo-c project).

I wish you much success in developing your operating system. I developed an enterprise resource planning system and an alternative system for PHP-based websites (in C and C++) and I can imagine how much work you still have ahead of you.

florianmarkusse commented 1 week ago

I fixed the build, seems I added a directory to the gitignore file. So at least it compiles now.

As for how it works, there is no real extensive documentation and I would certainly not use it in any final product at all. This project was never intended to be used in a browser if that is what you're looking for. Its idea, which is not properly communicated in the code or documentation anymore, I concur, was to be a way to programmatically build multiple HTML files at build time. Sort of how React uses components, but React does it at run-time. Whereas I had the intention to do this at build-time for this project.

The way it stores the parsed HTML is akin to how you would store data in a relational database as opposed to a more object-oriented-design.

For example, every html tag is a node that gets an ID, say 5, and a type. This is added to the nodes array

Then any children that are parsed inside this tag are also added to this nodes array. In addition, an entry is added in the parentChilds array for each of these tags:

parent ID child ID
5 6
5 7
5 8

...

This is in short how it works. This project was my first serious foray back into writing C code so it contains many things that I would do differently now, but it should be mostly readable?

In any case, I hope this elucidates the functioning a bit. If you have some more questions let me know! I won't be maintaining this repo up until I start working back on wanting to parse HTML again.

Thanks for the wishes, I hope you find something useful in this repo. ERP systems are very complex too, you must have learnt a lot!

mareb commented 1 week ago

Thank you!

mareb commented 1 week ago

I'm C-programmer since 1987 and today I use my own set of rules and encapsulated objects in C. I can send you some code snippets that you might find useful. But where? How to private mail with github? I haven't had time to publish my source yet, but it is under the GPL 2 license.

florianmarkusse commented 3 days ago

Sorry for the late response, life is ratherbusy these weeks.

It would be interesting to see! you can create a private repo and invite me or share it with my email florianmarkusse@hotmail.com if you are willing.