europass / europasscv-parser-js

Parse EuropassCV PDF/XML using JavaScript
MIT License
16 stars 7 forks source link

CORS #9

Open josebyte opened 4 years ago

josebyte commented 4 years ago

Hi!

I am trying to get the json of a pdf resume using this endpoint: https://europass.cedefop.europa.eu/rest/v1/document/extraction

I have tried from localhost and I have a CORS problem but in your demo I can see the same error, how can I make the request to your api without getting an error in CORS?

In the demo page: (https://europass.github.io/europasscv-parser-js/) Access to XMLHttpRequest at 'https://europass.cedefop.europa.eu/rest/v1/document/extraction' from origin 'https://europass.github.io' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: Redirect is not allowed for a preflight request.

ghost commented 3 years ago

Me too. The loader does not stop.

image

francesco14 commented 3 years ago

Same here. I start thinking that their APIs are no longer available. Is there a way to get this script works? I'm developing from localhost and I'm calling the EuropassParser() function passing a BLOB format and I get the CORS error. Schermata 2020-10-10 alle 17 52 26

paleimon commented 3 years ago

I have contacted the Europass Helpdesk via their contact form regarding this CORS error. I will update this issue with their response.

ghost commented 3 years ago

But has anyone managed to parse a CV somehow even non-UI?

paleimon commented 3 years ago

Europe Direct Contact Centre Update:

Thank you for contacting the Europe Direct Contact Centre.

With the launch of the new Europass platform, the existing interoperability services was discontinued. The Europass team is working on new and improved interoperability services. More news on them will follow in the coming months on the interoperability pages of the Europass platform. We apologise for any inconvenience.

The https://europass.cedefop.europa.eu/rest/v1/document/extraction link is no longer available and it should redirect you to https://europa.eu/europass/en/interoperability-europass where you can find information about interoperability between Europass and other education, training and labour market providers. The GitHub account will remain as the source code.

We hope you find this information useful. Please contact us again if you have other questions about the European Union, its activities or institutions.

ghost commented 3 years ago

Yep It do redirect actually https://europass.cedefop.europa.eu/rest/v1/document/extraction

So now this works? https://europass.github.io/europasscv-parser-js/ Seems nope Screenshot_20210123-143614.png


https://github.com/europass/europasscv-parser-js/search?q=https%3A%2F%2Feuropass.cedefop.europa.eu%2Frest%2Fv1%2Fdocument%2Fextraction&type=

PetrM97 commented 3 years ago

The new Europass CV is a PDF file with an attached XML file. This XML file can then be parsed to get the content. I have created a piece of code that uses PDF.js library to parse the new Europass CV and write the name of the person to the console.

var loadingTask = pdfjsLib.getDocument("resume.pdf");
loadingTask.promise.then(pdf => {
    console.log('PDF loaded');
    pdf._transport.getAttachments().then(attachments => {
        console.log('Attachments loaded');                
        b = new Blob([attachments.attachment.content], {type: "text/xml"});
        b.text().then(txt => {
            parser = new DOMParser();
            xmlDoc = parser.parseFromString(txt, "text/xml");
            person = xmlDoc.children[0].children[1].children[2].children[0]
            console.log(`Got Europass CV from ${person.children[0].textContent} ${person.children[1].textContent}`)
        });
    });
});

Or the pdfDetach utility from Poppler (available in most Linux distributions) can be used to get the XML file. I am not affiliated with the Europass project, so the specification might change in the future.

redtux commented 3 years ago

@macvag @europass1 any news? in case this project is deprecated, please archive the repository and add a warning to the readme indicating that this project has been discontinued (and why, and what alternatives exist). thank you for your help!

europass1 commented 3 years ago

Dear Sir/Madam,

Thank you for your mail.

Cedefop is no longer involved in the management of the Europass platform.

You can send your feedback to the new team through

https://europa.eu/europass/en/contact-us

Kind regards,

Philippe Tissot

From: Pablo Hörtner @.> Sent: 01 August 2021 01:15 To: europass/europasscv-parser-js @.> Cc: europass-team @.>; Mention @.> Subject: Re: [europass/europasscv-parser-js] CORS (#9)

@macvag https://github.com/macvag @europass1 https://github.com/europass1 any news? in case this project is deprecated, please archive the repository and add a note to the readme indicating that this project has been discontinued (and why, and what alternatives exist). thank you for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/europass/europasscv-parser-js/issues/9#issuecomment-890409884 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AILPRR2SPHQ7TKXNETQKKMLT2RYWRANCNFSM4RWG3CGQ .

beegotsy commented 1 year ago

For anyone who stumbles across this in the future:

  1. go to https://mozilla.github.io/pdf.js/web/viewer.html
  2. upload your file (leftmost icon on the right of the header)
  3. open the sidebar (leftmost icon on the left of the header)
  4. click on the attachment icon (label "Show attachment")
  5. you will find an attachment called "Europass-XML-Attachment.xml".
  6. download the file

The schema may be inconsistent, you will certainly need to unescape some encoded characters. You can easily find an XML to JSON converter, so that you will be able to work with the file in plain JavaScript.