georchestra / cadastrapp

Cadastre application for geOrchestra
GNU General Public License v3.0
10 stars 18 forks source link

createBordereauParcellaire - XSD are downloaded from external sources at each generation #548

Open pmauduit opened 3 years ago

pmauduit commented 3 years ago

Rennes-Métropole asked us to investigate why the PDF generation of the "bordereau parcellaire" were taking a long time on their platform (generally more than 30 seconds). Here is a summary of what we noticed so far:

After having instrumented the cadastrapp JVM on the test line, we got the following results during a call to the tested webservice:

Screenshot from 2020-12-22 12-14-57 Screenshot from 2020-12-22 12-09-22

Digging a bit further using tcpdump on the running container, we discovered that the necessary XSD used to validate the GetCapabilities document were fetched each time.

Here is a list of queries which are made on inspire.ec.europa.eu (plain http with no tls, so "easily" captured using tcpdump or such similar tools):

http://inspire.ec.europa.eu/schemas/inspire_vs/1.0/inspire_vs.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/common.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/network.xsd
http://inspire.ec.europa.eu/2001/xml.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_bul.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_cze.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dan.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dut.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_eng.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_est.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fin.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fre.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ger.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gle.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gre.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_hun.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ita.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lav.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lit.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_mlt.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_pol.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_por.xsd
http://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_rum.xsd

Note: using cURL to fetch them in an automated way takes only ~ 3 seconds, I would expect at least similar performances from the HTTP client from the JVM, I cannot explain why it takes more than 4 times the time needed (~ 16 seconds relying on our previous instrumentation, see previous screenshots, the JVM sampling should truly induce an overhead though).

By the way, one improvement could be to cache the XSD, as it seems possible to do so with GeoTools: https://docs.geotools.org/stable/javadocs/org/geotools/xml/resolver/SchemaCache.html

Another remark: I know that we don't have lessons to give on this topic in geOrchestra, but the GeoTools version used in cadastrapp is quite old (9.2, which was the same as the Mapfishapp one at the time of Cadastrapp development, as far as I remember). Testing a more up-to-date version might also improve things ? We are currently on 21.3 with mapfishapp:https://github.com/georchestra/georchestra/blob/master/pom.xml#L45

landryb commented 3 years ago

ugh. horrible. Whatever the fix, by all means +1000 on doing something to improve that :)

pierrejego commented 3 years ago

@pmaudit yes I agree for GeoTools version, when we had developed this extension we wanted to have the same dependencies as Mapfishapp, but we never update it afterward.

For the latency, the strange thing is that on JDev environnement, printing "Bordereau parcellaire" took less than 3 seconds. I have to reinstall backend to do some more test to see if I have the same schema calls effect.

But anyway, updating Geotools and adding cache for schema is a good idea.

catmorales commented 3 years ago

@pierrejego will you add cache for schema in the next release of cadastrapp as decribed above ? Because we have time out (504 Gateway Time-out - more than 50 s) creating BP by lot on request as this: https://portail.sig.rennesmetropole.fr/cadastrapp/services/createBordereauParcellaire?parcelle=350001000AB0059%2C350001000AB0066%2C350001000AB0071%2C350001000AB0504%2C350001000AB0525%2C350001000AB0701%2C350001000AB0715%2C350001000AB0716%2C350001000AB0779%2C350001000AB0981%2C350001000AB1183%2C350001000AB1184%2C350001000AB1370%2C350001000AB0050%2C350001000AB1376%2C350001000AB1400%2C350001000AB1584&personaldata=0&basemapindex=0&fillcolor=81BEF7&opacity=0.4&strokecolor=111111&strokewidth=3

landryb commented 2 years ago

supposedly closed by fabc4db5f ?

pierrejego commented 2 years ago

No what I have done is not enough. When testing I can't see any xsd in the temp folder. I have never used SchemaCache, even if it's Automaticly configured, I think I need to declare something in geotools to use it. If someone has an example, it could be interesting for me. I have check geoserver source code, but I did found where the enable it.

MaelREBOUX commented 2 years ago

No improvement for us. Generating PDF is still slow.

pierrejego commented 2 years ago

En modifiant l'url de cadastre.wms.url pour pointer sur le workspace de cadastrapp et pas tout le geoserver, cela corrige les lenteurs.

Mais à Rennes Métropole on a un soucis J'en ai conclu que lorsque le Workspace est renseigné dans l'URL du WMS ( cadastre.wms.url=https://portail-test.sig.rennesmetropole.fr/geoserver/app/wms ), cadastrapp utilise le SLD par défaut de la couche (qui est transparent) et sinon ( cadastre.wms.url=https://portail-test.sig.rennesmetropole.fr/geoserver/wms ) il prend le paramètre envoyé par getImageBordereau ???

pierrejego commented 2 years ago

En faisant plus de test il y a un message URL rejected en passant par le app Il y a donc un blocage F5

pierrejego commented 2 years ago

Test fait après déblocage 2min pour 41 parcelles sur portail test et 1 min pour 46 parcelles sur gis.jdev.fr

pierrejego commented 2 years ago

Continuer a essayer de mettre en cache le xsd mais surtout le getCapabilities si possible

jusabatier commented 2 years ago

@MaelREBOUX

No improvement for us. Generating PDF is still slow.

Même après la montée de version de GeoTools ?

@pierrejego

No what I have done is not enough. When testing I can't see any xsd in the temp folder. I have never used SchemaCache, even if it's Automaticly configured, I think I need to declare something in geotools to use it. If someone has an example, it could be interesting for me. I have check geoserver source code, but I did found where the enable it.

Je sais pas si tu as pu avancer sur ça, mais au vu de : https://github.com/geotools/geotools/blob/main/modules/library/xml/src/main/java/org/geotools/xml/SchemaFactory.java#L96

Ne faudrait-il pas tout simplement définir au niveau de la JVM un -Dschema.factory.cache=<a definir> ?

landryb commented 2 years ago

Ne faudrait-il pas tout simplement définir au niveau de la JVM un -Dschema.factory.cache=<a definir> ?

testé localement avec le backend v2.0, je n'ai rien de caché dans le rept. Je n'ai pas l'impression d'avoir de tels ralentissements, il faut ~10s pour génerer un BP sur ma pf de dev avec un fond ortho venant de l'IGN.

@pmauduit c'était quoi ta commande tcpdump pour n'avoir que les urls externes ? tcpdump sur port 80 sur l'iface externe ?

MaelREBOUX commented 3 months ago

Note de suivi : à tester à Rennes APRÈS upgrade du backend 1.9 -> 2.2.

landryb commented 3 months ago

geotools a été upgradé: