ebi-gene-expression-group / atlas-annotations

The pipeline producing bioentity annotations used in Expression Atlas searches
0 stars 1 forks source link

Downloader for array designs fails silently #23

Open pcm32 opened 5 years ago

pcm32 commented 5 years ago

We have seen cases where array design downloads fail silently, generating an file full of HTML errors instead of a proper design file, but not failing in the process: it should either exit with an error code or retry and then fail.

Output for a recent download of oryza_indica array design A-AFFY-126 produces:

                                                                    <h1><a href="/" title="Back to Server error homepage">Server error</a></h1>^M       
                                            </div>^M    
                                        <li id="about" class=" last"><a href="//www.ebi.ac.uk/about" title="About us">About us</a></li>^M       
                                        <li id="industry" class=""><a href="//www.ebi.ac.uk/industry" title="Industry">Industry</a></li>^M      
                                        <li id="research" class=""><a href="//www.ebi.ac.uk/research" title="Research">Research</a></li>^M      
                                        <li id="services" class=" first "><a href="//www.ebi.ac.uk/services" title="Services">Services</a></li>^M       
                                        <li id="training" class=""><a href="//www.ebi.ac.uk/training" title="Training">Training</a></li>^M      
                                    </ul>^M     
                                <div id="global-masthead" class="masthead grid_24">^M   
                                <div id="local-masthead" class="masthead grid_24 nomenu">^M     
                    <!-- set active class as appropriate -->^M  
                <h3 class="about"><a href="//www.ebi.ac.uk/about">About us</a></h3>^M   
                <h3 class="embl-ebi"><a href="//www.ebi.ac.uk/" title="EMBL-EBI">EMBL-EBI</a></h3>^M    
                <h3 class="industry"><a href="//www.ebi.ac.uk/industry">Industry</a></h3>^M     
                <h3 class="research"><a href="//www.ebi.ac.uk/research">Research</a></h3>^M     
                <h3 class="services"><a href="//www.ebi.ac.uk/services">Services</a></h3>^M     
                <h3 class="training"><a href="//www.ebi.ac.uk/training">Training</a></h3>^M     
                <ul id="global-nav">^M  
            <!-- NB: for additional title style patterns, see http://frontier.ebi.ac.uk/web/style/patterns -->^M        
            <!-- local-title -->^M      
            <!--This has to be one line and no newline characters-->^M  
            </div>^M    
            </nav>^M
...
pcm32 commented 5 years ago

I cannot find the download logic for this... any ideas @suhaibMo @alfonsomunozpomer ? Maybe inside the scala par?

suhaibMo commented 4 years ago

Tags found at the end of the Array design file A-AFFY-141 which is related to the above failing silently

Modernizr enables HTML5 elements & feature detects; Respond is a polyfill for min/max-width CSS3 Media Queries^M
This problem means that the service you are trying to access is currently unavailable. We're very sorry.</p>^M
chromium.org/developers/how-tos/chrome-frame-getting-started -->^M
alfonsomunozpomer commented 4 years ago

There’s an error in the download, and the server gives you back the error page, instead of whatever you asked. Have you found out what URL is being requested?

pcm32 commented 4 years ago

This happens randomly per release on different files, I think that the HTML that you see is the EBI's loadbalancer giving you its message after the accepted timeout. What should happen though is that the scala code should be sending this to the stderr and not to the stdout that is using to write the file, and probably emit an error code, instead of failing silently. Apparently on a re-try it appends, leaving the actual content after the HTML error.

pcm32 commented 4 years ago

This time it happened for a human array.

alfonsomunozpomer commented 4 years ago

The first thing to know would be the URL and see if the server is returning an appropriate error code.

alfonsomunozpomer commented 4 years ago

Maybe I’m stating the obvious (this code is all very foreign to me), but it looks like the code is populating the array designs from BioMart, so the first thing should be to know the URL is constructed from the BioMart URL template in BioMart.sc and the array designs in each species in annsrcs/ensembl.