Open AmauryVanEspen opened 5 hours ago
hello,
It just saves to html file and it's not generic format probably. They are different formats comaring xml to html. Have a great days
1. XML Specification
2. HTML5
When to Use Each
-
XML is ideal for:
HTML5 is ideal for:
I have uploaded once pdf format to one webpage(don't remember name(some cv maker online)) and it could extract values(maybe not all) from .pdf when html was saved in *.pdf format in browser. (Probably by titles (semantic)). For example.
<section>
<h2 class="left">Address(es)</h2>
<h3 class="righ">Address Address, Address Address (Address)</h3>
</section>
So maybe they took first (h2) and added values from (h3). I'm not sure how pdf is structured. I just know it can be saved (printed) to pdf file. And php probably have tools for making pdf and work with these files. So server (probably php) could extract values somehow.
Nice, do you believe that we can build a XML format from the values ?
Le mer. 20 nov. 2024 à 12:37, Kostas @.***> a écrit :
I have uploaded once pdf format to one webpage(don't remember name(some cv maker online)) and it could extract values(maybe not all) from .pdf when html was saved in *.pdf format in browser. (Probably by titles (semantic)). For example.
Address(es)
Address Address, Address Address (Address)
— Reply to this email directly, view it on GitHub https://github.com/KostasSliazas/Europass-Maker-Offline/issues/1#issuecomment-2488348705, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCHPAPUNWT42RODXPYRNWD2BRYA3AVCNFSM6AAAAABSEEWTZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBYGM2DQNZQGU . You are receiving this because you authored the thread.Message ID: @.***>
I have tested html source with online tools html > xml and it coverts probably best from tables. So maybe changing html structure to tables is most generic way. They read best from tables as I understand for converting. So it could be done by modifiyng structure of source code. So when saving changing something with js or by coverting to table source and adjusting js to it. Or changing js to change structure on saving file. In short it need to change html structure or maybe loop with js from elements adding some dataset attributes and then saving in loop all values. Maybe the best way to add attributes in source like 'Name', 'Address' and etc. and then 'injecting' in xml source and saving. So adding some attributes maybe would be the most accurate way. So by according by AI. It just need to be changed structure for example: `To convert HTML to XML, you need to ensure that the content is well-formed according to XML rules. Here's what you should follow:
Use Proper Tag Nesting: Tags must be properly opened and closed.
No Unquoted Attribute Values: All attribute values must be enclosed in double or single quotes.
No Special Characters: Special characters (e.g., &, <, >) must be escaped.
No Implicit Tag Closures: All tags must explicitly close (e.g., <br /> instead of <br>).
Here’s how your provided HTML can be transformed into valid XML:
cv structure.txt Asked AI for code and it made this: If you paste in console you can get xml right to it.
function generateXML() {
// Function to create XML content from DOM element recursively
function elementToXML(element) {
const tagName = element.tagName.toLowerCase();
let xmlContent = "";
// Process only if the element has children or text content
if (element.children.length === 0 && element.textContent.trim()) {
xmlContent = `<${tagName}>${element.textContent.trim()}</${tagName}>`;
} else if (element.children.length > 0) {
xmlContent = `<${tagName}>`;
Array.from(element.children).forEach(child => {
xmlContent += elementToXML(child); // Recursively process child elements
});
xmlContent += `</${tagName}>`;
}
return xmlContent;
}
// Select the main container to start processing
const mainContainer = document.querySelector("#main");
if (!mainContainer) {
console.error("Main container not found.");
return;
}
// Generate XML from the main container
let xml = `<?xml version="1.0" encoding="UTF-8"?>\n<document>`;
Array.from(mainContainer.children).forEach(child => {
xml += elementToXML(child);
});
xml += `</document>`;
// Create a downloadable XML file
const blob = new Blob([xml], { type: "application/xml" });
const link = document.createElement("a");
link.href = URL.createObjectURL(blob);
link.download = "document.xml";
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
// console.log("XML Generated:\n", xml); // Log XML for debugging
}
// Trigger the function (you can bind this to a button click)
generateXML();
And structure would look like this for example: `
`
Yes, if you are asking about tag names it could be done by: email_address>example@example.com</email_address it's said and it's valid . According AI > tags cannot have spaces, but you can use hyphens or underscores to separate words.
do you believe it could be compliant with the JSON Resume Schema ? https://jsonresume.org/schema
we just need to get values by id: to make it JSON?: for example: `function extractDataToJson() { // Extracting data from the HTML page const cvData = { basics: { name: document.querySelector("#vardas") ? document.querySelector("#vardas").textContent.trim() : "John Doe", label: document.querySelector("#pozicija") ? document.querySelector("#pozicija").textContent.trim() : "Programmer", image: document.querySelector("img#profile-picture") ? document.querySelector("img#profile-picture").src : "", email: document.querySelector("#email a") ? document.querySelector("#email a").href.replace('mailto:', '').trim() : "", phone: document.querySelector("#number a") ? document.querySelector("#number a").href.replace('tel:', '').trim() : "", url: document.querySelector("#website a") ? document.querySelector("#website a").href : "", summary: document.querySelector("#summary") ? document.querySelector("#summary").textContent.trim() : "No summary available", location: { address: document.querySelector("#address") ? document.querySelector("#address").textContent.trim() : "", postalCode: document.querySelector("#postalCode") ? document.querySelector("#postalCode").textContent.trim() : "", city: document.querySelector("#city") ? document.querySelector("#city").textContent.trim() : "", countryCode: document.querySelector("#countryCode") ? document.querySelector("#countryCode").textContent.trim() : "", region: document.querySelector("#region") ? document.querySelector("#region").textContent.trim() : "" }, profiles: [{ network: "Twitter", username: "europassmaker", // Replace with actual username if available in HTML url: "https://twitter.com/europassmaker" // Replace with actual URL if available }] }, work: [{ name: document.querySelector("#companyName") ? document.querySelector("#companyName").textContent.trim() : "Company Name", position: document.querySelector("#jobTitle") ? document.querySelector("#jobTitle").textContent.trim() : "Position", url: document.querySelector("#companyWebsite a") ? document.querySelector("#companyWebsite a").href : "", startDate: document.querySelector("#workStartDate") ? document.querySelector("#workStartDate").textContent.trim() : "", endDate: document.querySelector("#workEndDate") ? document.querySelector("#workEndDate").textContent.trim() : "", summary: document.querySelector("#workSummary") ? document.querySelector("#workSummary").textContent.trim() : "", highlights: [ document.querySelector("#workHighlights") ? document.querySelector("#workHighlights").textContent.trim() : "No highlights available" ] }], volunteer: [{ organization: "Volunteer Organization", position: "Volunteer Developer", url: "https://nonprofit.com/", startDate: "2019-01-01", endDate: "2020-01-01", summary: "Contributed to open-source projects.", highlights: [ "Developed open-source software" ] }], education: [{ institution: "University of Lithuania", url: "https://university.com/", area: "Computer Science", studyType: "Bachelor", startDate: "2015-09-01", endDate: "2019-06-01", score: "4.0", courses: [ "CS101 - Introduction to Programming", "CS102 - Data Structures" ] }], awards: [{ title: "Best Developer", date: "2021-06-01", awarder: "Company XYZ", summary: "Awarded for excellence in development." }], certificates: [{ name: "Certified Web Developer", date: "2022-11-07", issuer: "Certification Body", url: "https://certificate.com" }], publications: [{ name: "Creating Perfect CV Templates", publisher: "Tech Journal", releaseDate: "2022-05-01", url: "https://publication.com", summary: "A detailed guide on designing CV templates." }], skills: [{ name: "Web Development", level: "Advanced", keywords: [ "HTML", "CSS", "JavaScript", "PHP" ] }], languages: [{ language: "English", fluency: "Native speaker" }], interests: [{ name: "Technology", keywords: [ "AI", "Machine Learning" ] }], references: [{ name: "Jane Doe", reference: "John is a skilled developer who contributed significantly to our projects." }], projects: [{ name: "CV Maker Project", startDate: "2019-01-01", endDate: "2021-01-01", description: "A project to create customizable CV templates.", highlights: [ "Developed user-friendly templates", "Integrated with Europass standards" ], url: "https://cvproject.com/" }] };
// Return the populated JSON return cvData; }
// Example usage: const jsonData = extractDataToJson(); console.log(JSON.stringify(jsonData, null, 2)); `
Hi @KostasSliazas is the html rendered file compliant with the XML format ? Thank you Amaury