Open mareb opened 1 week ago
Hi there! I have currently mothballed this project since I’m trying to focus on building my operating system.
this project’s goal is to be able to parse most html5 compliant files, i.e., mostly supporting standard XML markup, and not bothering with the older html standards.
Is there any way which I can help you with what you are trying to accomplish?
Thank you for your answer! Please do a clone in a new directory "git clone https://github.com/florianmarkusse/html-parser.git" and start a build. You see the missing files from compiler errors. It's not easy to understand your library – have you written any documentation that I haven't found? Or do you have a more extensive example of how it's used? (I saw the demo-c project).
I wish you much success in developing your operating system. I developed an enterprise resource planning system and an alternative system for PHP-based websites (in C and C++) and I can imagine how much work you still have ahead of you.
I fixed the build, seems I added a directory to the gitignore file. So at least it compiles now.
As for how it works, there is no real extensive documentation and I would certainly not use it in any final product at all. This project was never intended to be used in a browser if that is what you're looking for. Its idea, which is not properly communicated in the code or documentation anymore, I concur, was to be a way to programmatically build multiple HTML files at build time. Sort of how React uses components, but React does it at run-time. Whereas I had the intention to do this at build-time for this project.
The way it stores the parsed HTML is akin to how you would store data in a relational database as opposed to a more object-oriented-design.
For example, every html tag is a node that gets an ID, say 5, and a type. This is added to the nodes
array
Then any children that are parsed inside this tag are also added to this nodes
array. In addition, an entry is added in the parentChilds
array for each of these tags:
parent ID | child ID |
---|---|
5 | 6 |
5 | 7 |
5 | 8 |
...
This is in short how it works. This project was my first serious foray back into writing C code so it contains many things that I would do differently now, but it should be mostly readable?
In any case, I hope this elucidates the functioning a bit. If you have some more questions let me know! I won't be maintaining this repo up until I start working back on wanting to parse HTML again.
Thanks for the wishes, I hope you find something useful in this repo. ERP systems are very complex too, you must have learnt a lot!
Thank you!
I'm C-programmer since 1987 and today I use my own set of rules and encapsulated objects in C. I can send you some code snippets that you might find useful. But where? How to private mail with github? I haven't had time to publish my source yet, but it is under the GPL 2 license.
Sorry for the late response, life is ratherbusy these weeks.
It would be interesting to see! you can create a private repo and invite me or share it with my email florianmarkusse@hotmail.com if you are willing.
If I clone the project with
git clone https://github.com/florianmarkusse/html-parser.git
I cannot compile the project, because "array.h" and some other file are missing. Withgit checkout tags/v2.0.0
I get the older version that works.