cnr-ibf-pa / hbp-bsp-issues

Ticketing system for developers/testers and power users of the Brain Simulation Platform of the Human Brain Project
4 stars 0 forks source link

TSIUnavailableException in Simulation Analysis Step #460

Closed alex4200 closed 4 years ago

alex4200 commented 4 years ago

Expected behavior

Unicore works stable and is able to copy all required files for an analysis.

Actual Behavior (please include screenshot if possible)

When the user starts an analysis on PizDaint, some simulation output result files needs to get copied to the analysis folder. The simulation did create 4 reports (and not just 1 which is the default). However, from 16 such analyses, only 5 succeeded!

In all the other cases I get the same unicore error:

log: ["Tue Aug 06 11:00:52 CEST 2019: Created with ID d68a163f-0997-4aa5-854e-1504540728bb",…] 0: "Tue Aug 06 11:00:52 CEST 2019: Created with ID d68a163f-0997-4aa5-854e-1504540728bb" 1: "Tue Aug 06 11:00:52 CEST 2019: Created with type 'JSDL'" 2: "Tue Aug 06 11:00:52 CEST 2019: Client: Name: CN=Alexander Dietz 305532,O=HBP↵Xlogin: uid: [bp000128], gids: [addingOSgroups: true]↵Role: user: role from attribute source↵Security tokens: User name: CN=Alexander Dietz 305532,O=HBP↵Delegation to consignor status: true, core delegation status: false↵Message signature status: UNCHECKED↵Client's original IP: 128.178.97.150" 3: "Tue Aug 06 11:00:52 CEST 2019: Using default execution environment." 4: "Tue Aug 06 11:00:52 CEST 2019: Status set to PREPROCESSING (staging in)." 5: "Tue Aug 06 11:00:52 CEST 2019: Adding stage in subaction with id=63a185b9-30f8-41aa-b056-78a7fe3fe80e" 6: "Tue Aug 06 11:00:53 CEST 2019: Stage in log:" 7: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Created with ID 63a185b9-30f8-41aa-b056-78a7fe3fe80e" 8: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Created with type 'JSDL_STAGEIN'" 9: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Client: Name: CN=Alexander Dietz 305532,O=HBP↵Xlogin: uid: [bp000128], gids: [addingOSgroups: true]↵Role: user: role from attribute source↵Security tokens: User name: CN=Alexander Dietz 305532,O=HBP↵Delegation to consignor status: true, core delegation status: false↵Message signature status: UNCHECKED↵Client's original IP: 128.178.97.150" 10: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /mc2_Column_report_0.bbp -> mc2_Column_report_0.bbp" 11: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /out.dat -> out.dat" 12: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /SP_Ivy_report_1.bbp -> SP_Ivy_report_1.bbp" 13: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /BlueConfig -> BlueConfig" 14: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /mc5_Column_report_3.bbp -> mc5_Column_report_3.bbp" 15: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:52 CEST 2019: Started filetransfer /Random1KCentral_report_2.bbp -> Random1KCentral_report_2.bbp" 16: "Tue Aug 06 11:00:53 CEST 2019: Tue Aug 06 11:00:53 CEST 2019: Filetransfer FAILED: /mc2_Column_report_0.bbp -> mc2_Column_report_0.bbp, error message: Error executing filetransfer: de.fzj.unicore.xnjs.tsi.TSIUnavailableException: TSI unavailable: java.io.IOException: Can't create connection to TSI at daint101.cscs.ch/148.187.26.64:4433: java.net.SocketTimeoutException: Read timed out [XNJS error 10]" 17: "Tue Aug 06 11:00:53 CEST 2019: Stage in is DONE." 18: "Tue Aug 06 11:00:53 CEST 2019: Stage in was NOT SUCCESSFUL: Filetransfer FAILED: /mc2_Column_report_0.bbp -> mc2_Column_report_0.bbp, error message: Error executing filetransfer: de.fzj.unicore.xnjs.tsi.TSIUnavailableException: TSI unavailable: java.io.IOException: Can't create connection to TSI at daint101.cscs.ch/148.187.26.64:4433: java.net.SocketTimeoutException: Read timed out [XNJS error 10]"

Steps to reproduce the problem

BerndSchuller commented 4 years ago

hi Alex, these kinds of errors tend to be somewhat hard to track, it appears to be a configuration issue at the CSCS site. I'll try and contact the CSCS administrator for the UNICORE services.

alex4200 commented 4 years ago

As of today, the problem cannot be reproduced anymore. Closing ticket