genouest / biomaj-download

Download microservice for BioMAJ
GNU Affero General Public License v3.0
1 stars 7 forks source link

Redirection website #33

Closed braffes closed 4 years ago

braffes commented 4 years ago

Hi, I have a problem, few websites return a 301 or 302 HTTP code when I try to access to a file. I have to put manually the redirection link in the bank configuration to solve temporarily the problem. To resolve this problem, a possibility is to add self.crl.setopt(pycurl.FOLLOWLOCATION, True) in the code of curl.py. The problem(if it is) with this solution is that will also do all redirection http to https without prevention.

So, is it possible to add an option like you done with SSH_KNOWNHOSTS for example, to accept or not a redirection for a bank or to get a solution to solve redirection problem?

Thanks for you attention,

Brice

osallou commented 4 years ago

@duboism can you check for this along your current PR? I think, that redirection should anyway always be followed (I am surprised it is not the default)

duboism commented 4 years ago

Hello,

@braffes is right: cURL defaults to not following redirections (see the doc of CURLOPT_FOLLOWLOCATION and the man page for option -L).

It should be relatively easy to add an option to block redirections (I think that it should allowed by default). Note that, options in biomaj can be set globally and overridden for a bank (that is, you can block redirections globally but allow them for some banks). We should log somewhere if a redirection happens.

@braffes : is that a problem for you to be redirected to HTTPS ?

braffes commented 4 years ago

Hi, No, it's not a problem for me to get redirected to HTTPS. Especially if you do logs to inform the redirection.

duboism commented 4 years ago

OK. Can you give us an example of a bank that uses redirections for our tests ?

braffes commented 4 years ago
[GENERAL]
######################

### Initialization ###
db.fullname="redirect http to https system bank test"
db.name=redirect
db.type=nucleic_protein
db.formats=fasta

offline.dir.name=offline/test/local_tmp
dir.version=test/directhttp

frequency.update=0

### Synchronization ###

files.num.threads=1

# https needed
protocol=directhttp
server=plasmodb.org
remote.dir=/common/downloads/Current_Release/PvivaxSal1/fasta/data/PlasmoDB-%(remoterelease)s_PvivaxSal1_Genome.fasta
target.name=%(remote.dir)s

# https needed
release.protocol=directhttp
release.server=plasmodb.org
release.remote.dir=/common/downloads/Current_Release/Build_number
release.file=Build_number
release.regexp=(.*)

local.files=^.*$

## Post Process  ##  The files should be located in the projectfiles/process
BLOCKS=

### Deployment ###

keep.old.version=1

Here is an example of code 301 HTTP( HTTP to HTTPS needed).

duboism commented 4 years ago

I have started a PR (see #35) to solve that. The basics work but I still have some doubts about how to initialize the default.

braffes commented 4 years ago

I don't see any reason to initialize differently than ssl_verifyhost and ssl_verifypeer.

duboism commented 4 years ago

Let's discuss that on the page of PR #35.

duboism commented 4 years ago

If I'm correct after the integration of #35 and the release of v3.2.3 this issue can be closed.