aldefouw / redcap-api-transfer

This project contains Ruby libraries that aim to make transferring a REDCap project from one server to another as simple as running a single script from command line.
6 stars 1 forks source link

Importing a WORD, PDF or any filetype other than TEXT corrupts the file using API Code (Ruby) #6

Closed meet-alikhan closed 3 years ago

meet-alikhan commented 3 years ago

We are migrating the Redcap projects using this utility.

Issue: While importing the files other than Text files, file are uploaded successfully, however they are getting corrupted.

Cause: While importing, the file is being read using File.read(file) which corrupts the file if it's not a text file.

Importing a DOC/PDF/EXCEL file from API playground or Postman works fine. The code which is used is the same code which is being provided in the Redcap samples https://redcaptest.cc.nih.gov/api/help/?content=examples

Below is the code which is causing the issue, any help would be appreciated.

def body(boundary, id, field, event_name, file) <<-EOF --#{boundary} Content-Disposition: form-data; name="file"; filename="#{File.basename(file)}" Content-Type: application/octet-stream #{File.read(file)} --#{boundary} #{import_file_fields(id, field, event_name).collect{|k,v|"Content-Disposition: form-data; name=\"#{k.to_s}\"\n\n#{v}\n--#{boundary}\n"}.join} EOF end

aldefouw commented 3 years ago

@meet-alikhan - Please pull down the latest version of the library, set your project "verbose" setting to "true".

project_name:
    source:
      url:
      token:

    destination:
      url: 
      token: 

    processes: 1
    verbose: true

Make sure that you rebuild your docker image before running the script again. (This will ensure you are using newest version of codebase when you run the shell script.)

The update will take into account whether your file upload is capable of using SSL. (It uses your destination URL to determine whether SSL is possible.).

If you set verbose: true part of your output will be what your file stream looks like. Again, I haven't been able to reproduce your results with a test PDF file. The file uploads work fine for me on 10.6.1x.

Also ... should note that the file uploads even seem to work across different versions. I ran a test between 10.6 and 10.3 and it worked fine.

Finally ... you should know that I cannot see this: https://redcaptest.cc.nih.gov/api/help/?content=examples

It's probably fire-walled. But basically what you're saying is that I should download the zip of examples ... but you are correct that the example file already pretty much looks like what my script has for importing a file ...

meet-alikhan commented 3 years ago

@aldefouw - I have shared you my code through my personal repository.

Note: Please ignore few of the changes in the code which I have made according to our requirements, also to add please bear with me for any bad code written down here in Ruby, as I am a full-time React/Javascript/Dotnet developer, I have been learning for the first time through this utility.

aldefouw commented 3 years ago

@meet-alikhan - I performed test transfers on your Word documents that were provided using my Docker instances of REDCap running on 10.6.15. They transfer flawlessly using my unmodified library on 10.6.15 of REDCap from source to destination. There is no file corruption and they appear identical to the source file on the destination.

Because your repository has deviated so far from my original source code, I suspect I'd not only have to perform a full code review in order to determine what is happening but also see your specific setups. (I'm not offering, but that's what it would take.)

If I had to make an informed guess, I think there is some problem related to configuration of your REDCap vs. what your script is configured for because I was able to successfully transfer files using YOUR source code too ... if I removed your CACERT code.

I understand you might need that cacert code for SSL support perhaps, but I don't have that need for my test instances because I'm working locally between Docker containers to run my tests on your code.

I also noticed in your logs that you had SSL errors so you might want to investigate:

E, [2021-09-17T01:47:54.300870 #18848] ERROR -- : Failure when attempting to download data template. Possible reason: 
E, [2021-09-17T01:47:54.303075 #18848] ERROR -- : Error: [Curl::Err::SSLPeerCertificateError, "SSL peer certificate or SSH remote key was not OK"]

If there was a problem with my provided library, I would have been happy to fix it; however, you might imagine that I cannot spend a great deal of time troubleshooting problems that are within source code that has been modified to be specific to your institution's project requirements. By the way, because the library is open source, there is no problem with modifying what I wrote - but I hope you understand it becomes a lot harder to support once you do. (Not that I offer any SLA for my basic library ... I'm only helping you to be nice - so use at your own risk.)

In any case, my recommendation is to create a new, clean clone of the latest copy of my repository to your machine and get the basic REDCap A => B transfers working. You can turn on verbose mode to see what's going on behind the scenes. Once you have non-corrupted file uploads on your destination side, you can start writing whatever custom functionality you need.

Sorry that you are having trouble transferring those files, but this is not a problem related to my script so there is no code fix I can write to fix it.

One thought I have though:

I noticed that you moved the Gemfile from Ruby 2.6.5 to 2.7.3. You have to understand that the entire impetus for using a Docker container to execute the calls is that using a Docker container allows the Ruby version to remain stable within a container that I can guarantee environment context. I know that might seem trivial, but the reason that I'm using a Docker container in the first place is to guarantee that you're using the same versions of the dependencies for Ruby and CURL on your setup that I am. If you start running against an alternate version of Ruby (perhaps run on your local system), that also means you're running the libraries against an alternative version of CURL that is on your machine. Different curl installations have different behavior and that might be part of your issue as well. Are you using the Docker container to execute the commands? If not, that's your first goal.

Note within the the Dockerfile that I'm actually compiling a unique copy of CURL that has the expected SSL behaviors that we want:

COPY curl-7.71.1.tar.gz /
RUN tar -xvf curl-7.71.1.tar.gz
RUN /bin/bash -c "curl-7.71.1/./configure --without-nss --with-openssl"
WORKDIR curl-7.71.1
RUN make && make install

If you find that you need additional settings related to SSL support related to cacerts, I'm happy to see if there is a way to generically provide support to the base library. In that case, you'll have to provide me with what settings you need me to add to the basic library.

meet-alikhan commented 3 years ago

@aldefouw - That's a good finding for me, if the same code is working for you, then you maybe correct, there is something from the Redcap end in our environments which is causing the failure. I totally understand your point, even I don't want you to spend long time reviewing my code, I wanted you to verify my File Import code which you did, I really appreciate your help. Let me try digging in deep from our end, will keep you updated and close the issue for now.