cloudinary-community / gatsby-transformer-cloudinary

Use Cloudinary images with gatsby-image for high performance and total control!
https://gatsby-transformer-cloudinary.netlify.app/
MIT License
69 stars 28 forks source link

Each build creates many new transformations on already uploaded images #34

Closed amcc closed 3 years ago

amcc commented 4 years ago

We have a library of about 11,000 images that are being added in gatsby-node using exports.sourceNodes each image then gets added with createRemoteFileNode

This successfully uploads everything to cloudinary. The issue we're having is that it does this on every build and is burning through a lot of transformations each time. All our images are now uploaded, so its simply going through existing images each time and is doing new transformations.

I've tried setting createDerived to false, in the last build it created 43,300 transformations on 11226 images.

How do we prevent Cloudinary doing new transformations on images it already has?

(I also posted this in a closed issue that I'm unable to reopen - sorry for the multiple posts)

amcc commented 4 years ago

I've been in contact with Cloudinary support - they've suggested the following:

_

In your case, it appears that your build process is uploading the same images over and over again, and also requesting that we create new derived versions of those images corresponding to different responsive breakpoints.

If the images aren't actually changing between builds, the quickest thing you can do is add 'overwrite: false' to the upload API call, either directly or in an upload preset, so only the images that have changed will be re-uploaded and processed

Looking at the code for the plugin you're using, this could be done here: https://github.com/cloudinary-devs/gatsby-transformer-cloudinary/blob/4c47a6173cc39ca3131935a6fde7c84578d95a7f/packages/gatsby-transformer-cloudinary/upload.js#L15

_

A colleague - @jthawme - has created a pull request to enable an option for 'overwrite: false'. That might be the solution here.

jlengstorf commented 4 years ago

overwrite: false definitely seems like the right call here. @Chuloo are you still leading maintenance on this project? I think #33 addresses this (along with a bunch of other improvements)

amcc commented 4 years ago

@jlengstorf thanks for following this up - would be amazing to have this in time for a launch we have early next week.

In case anyone else needs overwrite to be false without this - it can also be done with an upload preset on Cloudinary in the settings on the web interface: https://cloudinary.com/documentation/upload_presets. But having it in the plugin makes a lot more sense. Looks like the other improvements in #33 would be very useful too.

We currently have an ongoing issue with transformations happening despite setting overwrite to false. We have had the same support guy from Cloudinary look into this and they aren't sure why that is still happening.

Chuloo commented 4 years ago

Thanks for this @amcc. This is fixed in #33 you can specify overwrite in gatsby-config.js

Chuloo commented 4 years ago

@amcc what other issue do you have with transformations happening despite setting overwrite to false?

amcc commented 4 years ago

@Chuloo we are still getting transformations on each upload, despite setting an upload preset to overwrite false. We have spoken to support who confirmed that no new uploads are happening, but they're seeing transformations and are investigating it.

If there's any reason for still seeing transformations you can think of let me know - we aren't doing any custom stuff with cloudinary. Only setting max-width on fluid and an aspect ratio - but nothing changes between builds.

Its a bit of an issue as we have 11k images, get around 17k transformations in each build

we will update the plugin and see if that helps things, here's our config by the way:

{
      resolve: "gatsby-transformer-cloudinary",
      options: {
        cloudName: process.env.CLOUDINARY_CLOUD_NAME,
        apiKey: process.env.CLOUDINARY_API_KEY,
        apiSecret: process.env.CLOUDINARY_API_SECRET,
        uploadFolder: "gatsby-cloudinary",
        breakpointsMaxImages: 3,
        createDerived: false,
      },
    },
amcc commented 4 years ago

this might be helpful - i've a number of emails with Cloudinary support who've been very helpful here:

Looking at your usage, at a high level I think there's an issue with the fact that the plugin is sending all the uploads every time the build is run - I know that's typical in the Jamstack world, but with so many API calls being made for files that already exist it seems like it would be better as a 'sync' operation but I'm not sure if that's possible with this plugin, instead of it pushing everything all the time.

That said, we don't count transformations against your quota unless calls to us are causing a file to be uploaded or require us to process an image / create a new version of an existing image.

Looking at your upload logs, I think I've figured out what's happening here - it's due to the responsive_breakpoints option in the upload API calls that the plugin is sending.

This tells us to detect ideal breakpoints to use for this image for a responsive design and return the details in the API response: https://cloudinary.com/documentation/responsive_images#responsive_breakpoint_generator - This necessitates creating several copies of hte image though these are later discarded (via create_derived":false - if this was true, we'd save the created versions in your account and you'd see links for them in the API response in the 'derived' property)

My understanding was that overwrite: false would prevent this type of processing, so let me check it with our backend team.

If it's expected that we always generate the requested breakpoints even for reuploads with overwrite: false, there may be a solution via adjusting the plugin settings, build steps, or by using a different method of uploading, but let's check if it's simply a bug on our side first

Chuloo commented 4 years ago

Hi, @amcc thanks for sharing this, any word from the support team at Cloudinary? From the last info, it seems setting useCloudinaryBreakpoints to false (which is the default) in gatsby-config.js should fix this. Let me know 👍

Chuloo commented 4 years ago

Closing this out now, assuming it's been resolved. Please re-open asap if this is still persistent.

jasonbiondo commented 4 years ago

@jlengstorf @Chuloo This has not been resolved. As soon as I ran gatsby clean and then ran gatsby develop, this plugin deleted all of my images and reuploaded everything again. Even though I have overwrite set to false. Here is my config:

{
    resolve: "gatsby-transformer-cloudinary",
    options: {
            cloudName: process.env.CLOUDINARY_CLOUD_NAME,
            apiKey: process.env.CLOUDINARY_API_KEY,
            apiSecret: process.env.CLOUDINARY_API_SECRET,
            uploadFolder: "myfolder",
            useCloudinaryBreakpoints: false,
            overwrite: false
                       }
}

I have a store with 3,000 images and this makes this plugin unusable

jasonbiondo commented 4 years ago

Also even with these settings when I deploy through netlify, all of my images get redeployed and it racks up transformations - also not cool and expensive.

Chuloo commented 4 years ago

Hi @jasonbiondo, what transformations are you currently running on the images? Seems the challenge here is that the images are not being re-uploaded, but a transformation is applied to the original image stored on Cloudinary, on each build process.

jasonbiondo commented 4 years ago

The images get reuploaded for me too if I clean my cache or run a prod build, but I don't need to keep reuploading images all the time.

I don't think I'm really doing any transformations. Just quering the result set and then passing it to gatsby-image

allCloudinaryAsset { nodes { fluid { ...CloudinaryAssetFluid } } }

<Img className={classes.image} fluid={product.cloudinaryImages[0]} />

Chuloo commented 4 years ago

@jasonbiondo can you send a URL of any of the re-uploaded images? The complete URL on your site. Quality transformations q_auto and f_auto are applied by default and this could be the reason for the transformations even when not specified. Send an image URL from your site so I can confirm the applied transformations.

jasonbiondo commented 4 years ago

@Chuloo I see. Yeah here is a sample url: https://res.cloudinary.com/shoptrekeffect/image/upload/w_400,f_auto,q_auto/v1601221747/trekeffect/.cache/caches/gatsby-source-filesystem/4e5bc47a4bc729dae23d0443877d71d7/product-image-870463026_d1baa014-265a-4df9-a706-f50378409424

Chuloo commented 4 years ago

aha, I can see width, quality, and format transformations applied automatically as you didn't specify it in your query. On it now.

jasonbiondo commented 4 years ago

yeah, they must be. Please pop in a fix.

Chuloo commented 4 years ago

@jasonbiondo published a new version. You can specify enableDefaultTranformations in the plugin options in gatsby-config if you want the default quality transformations. I think for smaller projects it's trivial but the cost is more impactful in larger projects. Kindly test, possibly with a smaller sample size to see if it's all good now.

Also, I'm looking at a mod for the uploads when you run gatsby clean. When clean is run, all generated public files and cached files are deleted, hence the cloudinary reupload triggered. With overwrite on false, it shouldn't upload tho. Let's know if you get any upload/transformation usages in your cloudinary account too.

jasonbiondo commented 4 years ago

@Chuloo enableDefaultTranformations removes the transformations from the image urls but when running gatsby clean and then rerunning gatsby develop it reuploaded the images even with overwrite false and my transformations still run again. So far it looks like both of those issues are not fixed.

jasonbiondo commented 4 years ago

@Chuloo So I was going to put in a pull request because I think I found the fix to overwrite not being passed but I can't publish branches so... this is the issue.

image

Add those in and then the param will be passed

jasonbiondo commented 4 years ago

@jlengstorf Any idea why my query is failing on this? I see that you wrote the gatsby-node.js file which creates the image object but for some reason... no dice for me.

image

jasonbiondo commented 4 years ago

@jlengstorf For more clarity, this only happens on the initial query after I run gatsby develop:

image

Chuloo commented 4 years ago

@jasonbiondo thanks for this, I think it makes sense to pull the overwriteExisting value from the plugin options like the other options. Pushing the change now.

Chuloo commented 4 years ago

Published a new version @1.1.2. Does this solve your issue of overwrites? @jasonbiondo

jasonbiondo commented 4 years ago

@Chuloo yeah it has resolved transformations and image uploading. Now if I could only get the query issue fixed so I can load my app I'd be in decent shape.

Chuloo commented 4 years ago

@jasonbiondo I tried reproducing the error in the below steps but failed.

Here's the response image

Also, here's my gatsby config file below. Let me know if I'm missing something in your process. image

jasonbiondo commented 4 years ago

Even with the same config I still get the error. Supposedly ECONNRESET could be like a timeout connection with Cloudinary or a limit on the fetch. Where does the fetching happen when a query is called? Is it possibly some axios call that is timing out because I have so many images? Keep in mind, when I query in graphql the first query throws that error but then the error stops and I get the srcsets. However, when building my project all of the queries are failing even though they run in graphql.

jasonbiondo commented 4 years ago

@Chuloo for more information - When I "limit" the query it runs at 2000 but breaks at 2500. Any ideas?

export const query = graphql`
    {
        allCloudinaryAsset(limit: 2500) {
            nodes {
                fluid {
                    ...CloudinaryAssetFluid
                }
            }
        }
    }
Chuloo commented 4 years ago

There's a timeout value of 5minutes passed in https://github.com/cloudinary-devs/gatsby-transformer-cloudinary/blob/master/packages/gatsby-transformer-cloudinary/upload.js however this is on upload and not in the queries. How long does it take to build the queries for 2000 images?

I'm also looking at the internals of Gatsby to see if there is an existing issue on building large queries like this.

Chuloo commented 3 years ago

@jasonbiondo are you able to run the queries successfully now?

jasonbiondo commented 3 years ago

I can, but only because I added "build": "env-cmd -f .env.production gatsby build -H 127.0.0.0" to my package.json which is a bit of a hack. Otherwise, the large query will fail for me. Also, it's still not good because it would be nice if the image nodes were in the same location as my shopify product image nodes because I have to write a special query to get the assets and then do all this crazy filtering to get the correct images. It would be nice to have the cloudinary nodes where my products are so I can just easily add them to the shopify product query and I'd have to do no filtering that way. Any ideas on how to do this?

jlengstorf commented 3 years ago

@jasonbiondo connecting different nodes is probably outside the scope of this plugin, but you can do it with Gatsby's built-in APIs — check out https://www.gatsbyjs.com/docs/schema-customization/ for details on how you can pull the two types into a single custom type

jasonbiondo commented 3 years ago

@jlengstorf - Cool youtube vids btw. Rather than using the schema, I was looking into just using this plugin to upload the files and then was thinking it would make more sense to use cloudinaries react library approach rather than using gatsby-image. This seems like it would be easy since I would just have to pass in the public id, aka shopify photo product name, rather than querying allCloudinaryAssets and doing filtering. Would that be wise or is gatsby-image still better? See - https://cloudinary.com/documentation/react_image_manipulation

jlengstorf commented 3 years ago

@jasonbiondo you can use the manual gatsby-image integration in that case? if you already have the public ID it will let you create a fixed/fluid object without GraphQL https://gatsby-transformer-cloudinary.netlify.app/manual/

Chuloo commented 3 years ago

Closing this out, as the issue has been resolved and no further questions were asked. Please feel free to reopen if the issue is still prevalent. 🙏