ARM-software / CMSIS_5

CMSIS Version 5 Development Repository
http://arm-software.github.io/CMSIS_5/index.html
Apache License 2.0
1.34k stars 1.08k forks source link

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId. #800

Closed ilg-ul closed 4 years ago

ilg-ul commented 4 years ago

Resolution: the problem was caused by the recent change at Keil, which added a redirect from http to https, configuration not supported by the Java HttpURLConnect, which require to manually follow the redirections.

The error message is caused by the SAX parser trying to parse the html returned together with the 302 response.


It looks like something changed recently in the index.pidx file, crashing the SAX parser:

Parsing "http://www.keil.com/pack/index.pidx"...
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.

The current file reads like:

<?xml version="1.0" encoding="UTF-8" ?> 
<index schemaVersion="1.1.0" xs:noNamespaceSchemaLocation="PackIndex.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<vendor>Keil</vendor>
<url>http://www.keil.com/pack/</url>
<timestamp>2020-01-14T04:02:51.9611227+00:00</timestamp>
<pindex>
  <pdsc url="http://www.keil.com/pack/" vendor="ARM" name="minar" version="1.0.0" />
  ...
  <pdsc url="http://mcu.holtek.com.tw/pack" vendor="Holtek" name="HT32_DFP" version="1.0.24" />
</pindex>
</index>

I would suspect that the PackIndex.xsd requires a full absolute URL.

JonatanAntoni commented 4 years ago

Hi @ilg-ul,

thanks for letting us know.

The index file seems to be in sync with the specification in the documentation. There are no such elements like publicId or systemId around. Validating the file against the schema doesn't show any issues.

Cheers, Jonatan

edriouk commented 4 years ago

Hi Liviu,

We do not face any problem. Our plug-ins first downloads index.pidx, then parses it without validating against the schema ( the file is generated at the server side and therefore ensured to match the xsd file).
I have googled for the message and the problem seems to be similar to this one: https://stackoverflow.com/questions/46943878/org-xml-sax-saxparseexception-white-spaces-are-required-between-publicid-and-sy

Best regards, Evgueni

JonatanAntoni commented 4 years ago

One issue could be the redirect that happens when accessing http://www.keil.com/pack/index.pidx. A file download using wget or curl -L does resolve the redirect correctly. But using a different implementation to access web resources might introduce weird effects.

ilg-ul commented 4 years ago

Can you check when was the xs:noNamespaceSchemaLocation="PackIndex.xsd" added to the file? Since before this change everything was fine.

The reason for the shown error is that the parser cannot reach the schema file; when using relative paths, like in your case, the schema file is expected to be in the same folder as the parsed file, and it is not, neither at http://www.keil.com/pack/PackIndex.xsd, nor in the local folder if the index is first downloaded locally.

However, the safest way is to use absolute URLs.

I'm not sure this attribute should be present here. It should be present in your development environment to validate the index, but once you make it public you force all parsers to validate the content at each access. Not nice.

~~My first suggestion is to remove this attribute. ~~

If you decide to keep this attribute, please publish the schema in a public location and change the attribute to the absolute URL of the schema.

https://www.oreilly.com/library/view/xml-in-a/0596007647/re167.html

ilg-ul commented 4 years ago

Our plug-ins first downloads index.pidx, then parses it without validating against the schema

Lucky you! Downloading and parsing locally seems to disable the schema validation.

Here are the latest tests. Parsing directly from the URL:

2020-01-17 12:02:27
Update packs job started.
Parsing "http://www.keil.com/pack/index.pidx"...
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
File "/Users/ilg/Library/CMSIS-Packs/.cache/.content_www_keil_com_pack_index_pidx.xml" written.

Copying the file locally and parsing:

2020-01-17 12:04:57
Update packs job started.
Parsing "file:///Users/ilg/Downloads/index.pidx"...
Contributed 606 pack(s).

I first thought that the problem was introduced by updating the JDK to OpenJDK 13, but with the old 1.8 the behaviour was the same.

I have no idea how it worked before...

edriouk commented 4 years ago

The xs:noNamespaceSchemaLocation="PackIndex.xsd" was added to the index.pidx in October 2018.
I believe the problem is as described stackoverflow article and caused by URL redirection that was made recently.

ilg-ul commented 4 years ago

I believe the problem is as described stackoverflow article and caused by URL redirection that was made recently.

I understand that the redirection was added recently, but I do not think it causes the issue described at stackoverflow.

Here is the verbose curl output:

ilg@wks Downloads % curl -L http://www.keil.com/pack/index.pidx -o index.pidx -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 217.140.99.213...
* TCP_NODELAY set
* Connected to www.keil.com (217.140.99.213) port 80 (#0)
> GET /pack/index.pidx HTTP/1.1
> Host: www.keil.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 302 Found
< Server: Microsoft-IIS/8.5
< Content-Type: text/html
< Date: Fri, 17 Jan 2020 10:20:30 GMT
< Location: https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx
< Connection: Keep-Alive
< X-UA-Compatible: IE=EDGE
< X-Powered-By: ASP.NET
< Content-Length: 7764
< 
* Ignoring the response-body
{ [6559 bytes data]
100  7764  100  7764    0     0  69945      0 --:--:-- --:--:-- --:--:-- 70581
* Connection #0 to host www.keil.com left intact
* Issue another request to this URL: 'https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx'
*   Trying 52.190.240.132...
* TCP_NODELAY set
* Connected to sadevicepacksprodus.blob.core.windows.net (52.190.240.132) port 443 (#1)
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [255 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [81 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [5238 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.blob.core.windows.net
*  start date: May  2 00:41:38 2019 GMT
*  expire date: May  2 00:41:38 2021 GMT
*  subjectAltName: host "sadevicepacksprodus.blob.core.windows.net" matched cert's "*.blob.core.windows.net"
*  issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 4
*  SSL certificate verify ok.
> GET /idxfile/index.pidx HTTP/1.1
> Host: sadevicepacksprodus.blob.core.windows.net
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 76035
< Content-Type: text/plain
< Last-Modified: Tue, 14 Jan 2020 04:02:51 GMT
< ETag: 0x8D798A6A0712CCF
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: aaf11b0b-401e-000f-361f-cd00f4000000
< x-ms-version: 2009-09-19
< x-ms-lease-status: unlocked
< x-ms-blob-type: AppendBlob
< x-ms-blob-committed-block-count: 1
< Date: Fri, 17 Jan 2020 10:20:31 GMT
< 
{ [15980 bytes data]
100 76035  100 76035    0     0  54466      0  0:00:01  0:00:01 --:--:-- 79534
* Connection #1 to host sadevicepacksprodus.blob.core.windows.net left intact
* Closing connection 1
* Closing connection 0
ilg@wks Downloads % 

The file is not UTF-8 but text/plain and the downloaded file has no BOM, it starts directly with ASCII chars:

ilg@wks Downloads % hexdump  /Users/ilg/Downloads/index.pidx  
0000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31
0000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54
JonatanAntoni commented 4 years ago

Yes, you're right, the file has no BOM but it should be proper UTF-8 encoding nevertheless.

Might it happen that the stream reader you are using fails to detect proper encoding if there is no BOM right at the start? Any chance to force the stream reader to use UTF-8?

ilg-ul commented 4 years ago

Might it happen that the stream reader you are using fails to detect proper encoding if there is no BOM right at the start? Any chance to force the stream reader to use UTF-8?

Please note that exactly the same file is parsed by exactly the same code properly when copied locally. It should have nothing to do with encoding.

And parsing was ok until recently, when something changed on your side.

Most probably the issue is caused by the validation, which is not possible from your URL.

ilg-ul commented 4 years ago

caused by URL redirection that was made recently.

Evgueni seems right, I uploaded the index.pidx to GitHub and from there the Java parser can process it:

Parsing "https://github.com/ilg-ul/test-sax-validation/raw/master/index.pidx"...
Contributed 606 pack(s).

So my guess that it has something to do with validation was not confirmed.

A curl session looks like:

ilg@wks ~ % curl -L -o index2.pidx https://github.com/ilg-ul/test-sax-validation/raw/master/index.pidx -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 140.82.118.3...
* TCP_NODELAY set
* Connected to github.com (140.82.118.3) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [224 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [108 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3085 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: businessCategory=Private Organization; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; serialNumber=5157550; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
*  start date: May  8 00:00:00 2018 GMT
*  expire date: Jun  3 12:00:00 2020 GMT
*  subjectAltName: host "github.com" matched cert's "github.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*  SSL certificate verify ok.
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0> GET /ilg-ul/test-sax-validation/raw/master/index.pidx HTTP/1.1
> Host: github.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 302 Found
< Date: Fri, 17 Jan 2020 13:02:30 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Server: GitHub.com
< Status: 302 Found
< Vary: X-PJAX
< Access-Control-Allow-Origin: https://render.githubusercontent.com
< Location: https://raw.githubusercontent.com/ilg-ul/test-sax-validation/master/index.pidx
< Cache-Control: no-cache
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Frame-Options: deny
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Expect-CT: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
< Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com
< Age: 0
< Vary: Accept-Encoding
< X-GitHub-Request-Id: DDEB:F596:27E1B4C:3B5193D:5E21B065
< 
* Ignoring the response-body
{ [155 bytes data]
100   144    0   144    0     0    331      0 --:--:-- --:--:-- --:--:--   330
* Connection #0 to host github.com left intact
* Issue another request to this URL: 'https://raw.githubusercontent.com/ilg-ul/test-sax-validation/master/index.pidx'
*   Trying 151.101.16.133...
* TCP_NODELAY set
* Connected to raw.githubusercontent.com (151.101.16.133) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [239 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [108 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3182 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=www.github.com
*  start date: Mar 23 00:00:00 2017 GMT
*  expire date: May 13 12:00:00 2020 GMT
*  subjectAltName: host "raw.githubusercontent.com" matched cert's "*.githubusercontent.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
> GET /ilg-ul/test-sax-validation/master/index.pidx HTTP/1.1
> Host: raw.githubusercontent.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
< Strict-Transport-Security: max-age=31536000
< X-Content-Type-Options: nosniff
< X-Frame-Options: deny
< X-XSS-Protection: 1; mode=block
< ETag: W/"8c5f775585a16c5e8f27556fa1bd47117a66f17ae056af2b72affdaec243caa0"
< Content-Type: text/plain; charset=utf-8
< Cache-Control: max-age=300
< X-Geo-Block-List:
< Via: 1.1 varnish-v4
< X-GitHub-Request-Id: 3CA4:22F3:0333:03E3:5E21AF60
< Content-Length: 75423
< Accept-Ranges: bytes
< Date: Fri, 17 Jan 2020 13:02:30 GMT
< Via: 1.1 varnish
< Connection: keep-alive
< X-Served-By: cache-lcy19264-LCY
< X-Cache: HIT
< X-Cache-Hits: 1
< X-Timer: S1579266150.389912,VS0,VE1
< Vary: Authorization,Accept-Encoding
< Access-Control-Allow-Origin: *
< X-Fastly-Request-ID: 3196e178173b2a09b9bcb0fe77ef1a58b0687a1b
< Expires: Fri, 17 Jan 2020 13:07:30 GMT
< Source-Age: 261
< 
{ [1875 bytes data]
100 75423  100 75423    0     0   104k      0 --:--:-- --:--:-- --:--:--  104k
* Connection #1 to host raw.githubusercontent.com left intact
* Closing connection 0
* Closing connection 1
ilg@wks ~ % 

The one difference that I can spot is that GitHub responds with Content-Type: text/plain; charset=utf-8, while your server only with Content-Type: text/plain.

Could you find a fix for this?

JonatanAntoni commented 4 years ago

Liviu,

I asked the web hosting team if we can change the reported Content-Type to text/xml; charset=utf-8. Not sure what type of influence we have here since the files are shipped by Microsoft Azure.

Cheers, Jonatan

ilg-ul commented 4 years ago

the files are shipped by Microsoft Azure

:-(

Long live Microsoft!

ilg-ul commented 4 years ago

Based on further tests, configuring the plug-ins to use the real address (https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx) avoids the problem, so the culprit is the redirection, not the content type.

Could you compare the current redirection setup with the previous one, which worked, perhaps you can identify the problem?

For completeness, the curl session looks like this:

ilg@wks tmp % curl -L -o index3.pidx https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 52.190.240.132...
* TCP_NODELAY set
* Connected to sadevicepacksprodus.blob.core.windows.net (52.190.240.132) port 443 (#0)
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [255 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [81 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [5238 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.blob.core.windows.net
*  start date: May  2 00:41:38 2019 GMT
*  expire date: May  2 00:41:38 2021 GMT
*  subjectAltName: host "sadevicepacksprodus.blob.core.windows.net" matched cert's "*.blob.core.windows.net"
*  issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 4
*  SSL certificate verify ok.
> GET /idxfile/index.pidx HTTP/1.1
> Host: sadevicepacksprodus.blob.core.windows.net
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 76375
< Content-Type: text/plain
< Last-Modified: Sat, 18 Jan 2020 04:01:56 GMT
< ETag: 0x8D79BCB28BA6D49
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 42b49ce6-801e-007f-667d-cf7330000000
< x-ms-version: 2009-09-19
< x-ms-lease-status: unlocked
< x-ms-blob-type: AppendBlob
< x-ms-blob-committed-block-count: 1
< Date: Mon, 20 Jan 2020 10:33:14 GMT
< 
{ [15980 bytes data]
100 76375  100 76375    0     0  62911      0  0:00:01  0:00:01 --:--:-- 62911
* Connection #0 to host sadevicepacksprodus.blob.core.windows.net left intact
* Closing connection 0
ilg@wks tmp % 
ilg-ul commented 4 years ago

Any estimate when this issue will be addressed?

As a workaround, I currently asked users to reconfigure their Eclipses to use the windows.net URL, but this is not a solution for long term.

JonatanAntoni commented 4 years ago

In your analysis above the file still gets delivered as text/plain. I don't understand what's the difference from your clients point of view between being redirected or accessing the final URL directly. I doubt simply changing the content type to text/xml; charset=utf-8 fixes your issue.

ilg-ul commented 4 years ago

I doubt simply changing the content type to text/xml; charset=utf-8 fixes your issue

First, this is not my issue, I use the XML SAX parser available in the Oracle JDK in the simplest and most obvious configuration.

If I pass it the 'keil.com' URL, if fails; if I pass the windows.net URL, it passes; if I copy the file locally and pass the local URL, the parser passes again.

The content type seems to have no importance.

The problem is the new Microsoftish redirection, which confuses the Java parser.

If you think that the problem is not real simply because users of your CMSIS Eclipse plug-ins do not feel the pain, you are wrong, because Evgueni took a different path and copied the file locally (thus processing the redirect in a more fortunate context), but the problem is there for anyone trying to parse the file directly from the URL.

Please compare the current redirection setup with the previous one, which worked, and fix the problem.

JonatanAntoni commented 4 years ago

Our web team is investigating the issue. But the solution we pointed out in the first place won't be enough. I cannot give you an estimate, but probably not before end of January.

ilg-ul commented 4 years ago

the solution we pointed out in the first place won't be enough

If you mean fixing the content type, yes, I guess that won't make any difference. Check the redirects.

cdwilson commented 4 years ago

Quick example showing the redirect is causing issues:

$ curl -s https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx | xmllint --noout --schema PackIndex.xsd -
- validates
$ curl -s https://www.keil.com/pack/index.pidx | xmllint --noout --schema PackIndex.xsd -
-:1: parser error : Space required after the Public Identifier
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^
-:1: parser error : SystemLiteral " or ' expected
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^
-:1: parser error : SYSTEM or PUBLIC, the URI is missing
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
                                                 ^
-:2: parser error : error parsing attribute name
ts}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<t.length
                                                                               ^
-:2: parser error : attributes construct error
ts}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<t.length
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag t.length line 2
ts}if("function"==typeof __nr_require)return __nr_require;for(var i=0;i<t.length
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
n("ee").get("tracer"),u=n("loader"),s=NREUM;"undefined"==typeof window.newrelic&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
("ee").get("tracer"),u=n("loader"),s=NREUM;"undefined"==typeof window.newrelic&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
nction(n,e){m[e]=i(l+e)}),newrelic.noticeError=function(n,e){"string"==typeof n&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
ction(n,e){m[e]=i(l+e)}),newrelic.noticeError=function(n,e){"string"==typeof n&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
me?a("timing",["fp",Math.floor(n.startTime)]):"first-contentful-paint"===n.name&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
?a("timing",["fp",Math.floor(n.startTime)]):"first-contentful-paint"===n.name&&a
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
&&a("timing",["fcp",Math.floor(n.startTime)])})}function i(n){if(n instanceof c&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
&a("timing",["fcp",Math.floor(n.startTime)])})}function i(n){if(n instanceof c&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
-t:f.now()-t,s=!0,a("timing",["fi",t,{type:n.type,fid:e}])}}if(!("init"in NREUM&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
t:f.now()-t,s=!0,a("timing",["fi",t,{type:n.type,fid:e}])}}if(!("init"in NREUM&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
",t,{type:n.type,fid:e}])}}if(!("init"in NREUM&&"page_view_timing"in NREUM.init&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
,t,{type:n.type,fid:e}])}}if(!("init"in NREUM&&"page_view_timing"in NREUM.init&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
NREUM&&"page_view_timing"in NREUM.init&&"enabled"in NREUM.init.page_view_timing&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
it&&"enabled"in NREUM.init.page_view_timing&&NREUM.init.page_view_timing.enabled
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
ar o,a=n("handle"),f=n("loader"),c=NREUM.o.EV;if("PerformanceObserver"in window&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
r o,a=n("handle"),f=n("loader"),c=NREUM.o.EV;if("PerformanceObserver"in window&&
                                                                               ^
-:2: parser error : error parsing attribute name
(!e)return!0;if(!o)return!1;for(var t=o.split("."),r=e.split("."),a=0;a<r.length
                                                                               ^
-:2: parser error : attributes construct error
(!e)return!0;if(!o)return!1;for(var t=o.split("."),r=e.split("."),a=0;a<r.length
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag r.length line 2
(!e)return!0;if(!o)return!1;for(var t=o.split("."),r=e.split("."),a=0;a<r.length
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
\S+)\s+Safari/;if(navigator.userAgent){var f=navigator.userAgent,c=f.match(a);c&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
ari/;if(navigator.userAgent){var f=navigator.userAgent,c=f.match(a);c&&f.indexOf
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
r.userAgent){var f=navigator.userAgent,c=f.match(a);c&&f.indexOf("Chrome")===-1&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
t){var f=navigator.userAgent,c=f.match(a);c&&f.indexOf("Chrome")===-1&&f.indexOf
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
.userAgent,c=f.match(a);c&&f.indexOf("Chrome")===-1&&f.indexOf("Chromium")===-1&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
userAgent,c=f.match(a);c&&f.indexOf("Chrome")===-1&&f.indexOf("Chromium")===-1&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
{}],4:[function(n,e,t){function r(n,e){var t=[],r="",o=0;for(r in n)i.call(n,r)&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
}],4:[function(n,e,t){function r(n,e){var t=[],r="",o=0;for(r in n)i.call(n,r)&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
orts=r},{}],5:[function(n,e,t){function r(n,e,t){e||(e=0),"undefined"==typeof t&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
rts=r},{}],5:[function(n,e,t){function r(n,e,t){e||(e=0),"undefined"==typeof t&&
                                                                               ^
-:2: parser error : StartTag: invalid element name
||(e=0),"undefined"==typeof t&&(t=n?n.length:0);for(var r=-1,i=t-e||0,o=Array(i<
                                                                               ^
-:2: parser error : error parsing attribute name
efined"==typeof t&&(t=n?n.length:0);for(var r=-1,i=t-e||0,o=Array(i<0?0:i);++r<i
                                                                               ^
-:2: parser error : attributes construct error
efined"==typeof t&&(t=n?n.length:0);for(var r=-1,i=t-e||0,o=Array(i<0?0:i);++r<i
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag i line 2
efined"==typeof t&&(t=n?n.length:0);for(var r=-1,i=t-e||0,o=Array(i<0?0:i);++r<i
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
{}],6:[function(n,e,t){e.exports={exists:"undefined"!=typeof window.performance&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
xports={exists:"undefined"!=typeof window.performance&&window.performance.timing
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
ports={exists:"undefined"!=typeof window.performance&&window.performance.timing&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
orts={exists:"undefined"!=typeof window.performance&&window.performance.timing&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
rt}},{}],ee:[function(n,e,t){function r(){}function i(n){function e(n){return n&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
}},{}],ee:[function(n,e,t){function r(){}function i(n){function e(n){return n&&n
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
turn n&&n instanceof r?n:n?c(n,f,o):o()}function t(t,r,i,o){if(!d.aborted||o){n&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
rn n&&n instanceof r?n:n?c(n,f,o):o()}function t(t,r,i,o){if(!d.aborted||o){n&&n
                                                                               ^
-:2: parser error : error parsing attribute name
(t,r,i,o){if(!d.aborted||o){n&&n(t,r,i);for(var a=e(i),f=v(t),c=f.length,u=0;u<c
                                                                               ^
-:2: parser error : attributes construct error
(t,r,i,o){if(!d.aborted||o){n&&n(t,r,i);for(var a=e(i),f=v(t),c=f.length,u=0;u<c
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag c line 2
(t,r,i,o){if(!d.aborted||o){n&&n(t,r,i);for(var a=e(i),f=v(t),c=f.length,u=0;u<c
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
var a=e(i),f=v(t),c=f.length,u=0;u<c;u++)f[u].apply(a,r);var p=s[y[t]];return p&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
(i),f=v(t),c=f.length,u=0;u<c;u++)f[u].apply(a,r);var p=s[y[t]];return p&&p.push
                                                                               ^
-:2: parser error : error parsing attribute name
(n,e){h[n]=v(n).concat(e)}function m(n,e){var t=h[n];if(t)for(var r=0;r<t.length
                                                                               ^
-:2: parser error : attributes construct error
(n,e){h[n]=v(n).concat(e)}function m(n,e){var t=h[n];if(t)for(var r=0;r<t.length
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag t.length line 2
(n,e){h[n]=v(n).concat(e)}function m(n,e){var t=h[n];if(t)for(var r=0;r<t.length
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
).concat(e)}function m(n,e){var t=h[n];if(t)for(var r=0;r<t.length;r++)t[r]===e&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
e)}function m(n,e){var t=h[n];if(t)for(var r=0;r<t.length;r++)t[r]===e&&t.splice
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
,aborted:!1};return b}function o(){return new r}function a(){(s.api||s.feature)&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
aborted:!1};return b}function o(){return new r}function a(){(s.api||s.feature)&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
function r(n,e,t){if(i.call(n,e))return n[e];var r=t();if(Object.defineProperty&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
,e,t){if(i.call(n,e))return n[e];var r=t();if(Object.defineProperty&&Object.keys
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
=i},{}],id:[function(n,e,t){function r(n){var e=typeof n;return!n||"object"!==e&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
i},{}],id:[function(n,e,t){function r(n){var e=typeof n;return!n||"object"!==e&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
NREUM.info,e=l.getElementsByTagName("script")[0];if(setTimeout(s.abort,3e4),!(n&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
l.getElementsByTagName("script")[0];if(setTimeout(s.abort,3e4),!(n&&n.licenseKey
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
.getElementsByTagName("script")[0];if(setTimeout(s.abort,3e4),!(n&&n.licenseKey&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
gName("script")[0];if(setTimeout(s.abort,3e4),!(n&&n.licenseKey&&n.applicationID
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
Name("script")[0];if(setTimeout(s.abort,3e4),!(n&&n.licenseKey&&n.applicationID&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
me("script")[0];if(setTimeout(s.abort,3e4),!(n&&n.licenseKey&&n.applicationID&&e
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
+n.agent,e.parentNode.insertBefore(t,e)}}function i(){"complete"===l.readyState&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
.agent,e.parentNode.insertBefore(t,e)}}function i(){"complete"===l.readyState&&o
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
){c("mark",["domContent",a()+E.offset],null,"api")}function a(){return O.exists&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
Content",a()+E.offset],null,"api")}function a(){return O.exists&&performance.now
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
window,l=d.document,m="addEventListener",v="attachEvent",g=d.XMLHttpRequest,w=g&
                                                                               ^
-:2: parser error : Entity 'g.prototype' not defined
cument,m="addEventListener",v="attachEvent",g=d.XMLHttpRequest,w=g&&g.prototype;
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
errorBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
rorBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&&w
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
orBeacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&&w&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
Beacon:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&&w&&w
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
on:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&&w&&w[m]&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
n:"bam.nr-data.net",agent:"js-agent.newrelic.com/nr-1158.min.js"},b=g&&w&&w[m]&&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
");var x=0,O=n(6)},{}],"wrap-function":[function(n,e,t){function r(n){return!(n&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
;var x=0,O=n(6)},{}],"wrap-function":[function(n,e,t){function r(n){return!(n&&n
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
"wrap-function":[function(n,e,t){function r(n){return!(n&&n instanceof Function&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
nction":[function(n,e,t){function r(n){return!(n&&n instanceof Function&&n.apply
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
ction":[function(n,e,t){function r(n){return!(n&&n instanceof Function&&n.apply&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
tion":[function(n,e,t){function r(n){return!(n&&n instanceof Function&&n.apply&&
                                                                               ^
-:2: parser error : error parsing attribute name
)}function u(n,e,i,o){i||(i="");var a,f,c,u="-"===i.charAt(0);for(c=0;c<e.length
                                                                               ^
-:2: parser error : attributes construct error
)}function u(n,e,i,o){i||(i="");var a,f,c,u="-"===i.charAt(0);for(c=0;c<e.length
                                                                               ^
-:2: parser error : Couldn't find end of Start Tag e.length line 2
)}function u(n,e,i,o){i||(i="");var a,f,c,u="-"===i.charAt(0);for(c=0;c<e.length
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
it(t,r,i,e)}catch(a){d([a,t,r,i])}c=o}}function p(n,e){if(Object.defineProperty&
                                                                               ^
-:2: parser error : EntityRef: expecting ';'
catch(a){d([a,t,r,i])}c=o}}function p(n,e){if(Object.defineProperty&&Object.keys
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
t:function(e){return n[t]=e,e}})}),e}catch(r){d([r])}for(var i in n)f.call(n,i)&
                                                                               ^
-:2: parser error : xmlParseEntityRef: no name
:function(e){return n[t]=e,e}})}),e}catch(r){d([r])}for(var i in n)f.call(n,i)&&
                                                                               ^
cdwilson commented 4 years ago

Whoops, ignore above, I forgot the -L flag to curl:

$ curl -L -s https://www.keil.com/pack/index.pidx | xmllint --noout --schema PackIndex.xsd -
- validates
JonatanAntoni commented 4 years ago

Hi @cdwilson,

using curl -s clearly cannot work with redirects, you need to use -L in such a case.

curl -L https://www.keil.com/pack/index.pidx | xmllint --noout --schema PackIndex.xsd -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  7765    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 76375  100 76375    0     0  58346      0  0:00:01  0:00:01 --:--:--  228k
- validates

There is nothing basically wrong with the redirect itself. Its just a matter of coping with these redirects, properly. I am not an expert on that "XML SAX parser available in the Oracle JDK". Can you come up with a small command line reproducer revealing that issue? E.g. a java program I can run from command line in a similar way than above curl command? This might be helpful for our web team to analyse the issue.

Cheers, Jonatan

cdwilson commented 4 years ago

Yup, I realized that right after I posted it... [facepalm]

The original error message that @ilg-ul posted looks similar to the errors that curl is throwing when I forgot the -L flag, i.e.

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.

vs.

parser error : Space required after the Public Identifier

I wonder if there is some similar option to curl's -L that needs to be passed in the SAX parser.

ilg-ul commented 4 years ago

I did some further tests and the problem is definitely related to the redirection.

The problem is not in the SAX parser itself, but in the HttpURLConnection, used to read the content.

For reasons that I did not identify yet, in some cases this class does not follow redirections, and returns the error string issued by the server (html content). This string obviously is not a properly formed xml, and the SAX parser fails with that SAXParseException.

[Edit: The class does not follow redirections from http to https.]

Can you confirm that before the move to windows.net, the index.pidx file had no redirects at all? That would explain why it worked for so long and failed recently.

The strange thing is that in some other cases, exactly the same code used in the plug-ins performs as expected, following the redirect and returning the xml, not the error html. [Edit: the separate tests worked because they used https.]

I'll try to identify the reason of this inconsistent behaviour, and a possible solution to avoid it.

Evgueni @edriouk, any thoughts on this?

edriouk commented 4 years ago

Liviu, have you tried to use HttpURLConnection methods setFollowRedirects() and/or setInstanceFollowRedirects()? If it does not help, I see currently only the possibility to download the file first and then parse it.

ilg-ul commented 4 years ago

setFollowRedirects()

I checked and this property is already set to true. :-(

currently only the possibility to download the file first and then parse it

I already do this (actually I use an internal buffer), and the problem occurs when reading in the file via HttpURLConnection, instead of the xml I get the html error page.

The only way out I can see now is to explicitly process redirects in my code, which is silly.

edriouk commented 4 years ago

you can have a look how our code in CpRepoServiceProvider.readIndexFile() works: https://github.com/ARM-software/cmsis-pack-eclipse/blob/master/com.arm.cmsis.pack.installer/src/com/arm/cmsis/pack/installer/CpRepoServiceProvider.java

JonatanAntoni commented 4 years ago

Well, as far as I can recap we use redirection since quite a while. Need to dig deeper to understand if anything changed recently.

To be honest, I don't know what we should do if the implementation you are using is causing the wired behavior.

I cannot see that the redirect is somehow special and it works without issues using curl.

JonatanAntoni commented 4 years ago

Hi @ilg-ul,

I got some feedback from the web team. They moved from redirecting to http to redirecting to https on Jan 8th. This might indeed cause issues.

May I ask you to update the URL from http://www.keil.com/pack/index.pidx to https://www.keil.com/pack/index.pidx, please? Does this change anything on your end?

Cheers, Jonatan

ilg-ul commented 4 years ago

They moved from redirecting to http to redirecting to https on Jan 8th. This might indeed cause issues.

Indeed.

update the URL from http://www.keil.com/pack/index.pidx to https://www.keil.com/pack/index.pidx, please? Does this change anything on your end?

Yes, now it no longer throws the exception.

It looks like the Java classes cannot redirect from http to https.

Please note that your url change is not reflected by the documentation, which still points to http.

https://arm-software.github.io/CMSIS_5/Pack/html/packIndexFile.html

I think that you should explicitly announce this configuration change.

ilg-ul commented 4 years ago

have a look how our code in CpRepoServiceProvider.readIndexFile()

Thank you Evgueni. Yes, you are explicitly processing redirects, and do not rely on moody implementations. Good to know.