GitbookIO / gitbook-convert

CLI to convert an existing document to a GitBook.
102 stars 19 forks source link

35 character limit on images are creating duplicate image file names and overwriting other images #13

Open salfej opened 8 years ago

salfej commented 8 years ago

Hi all,

When importing images from a document and looking at the raw xml, I noticed the docx converter uses the image description instead of the image name. This leads to many images, especially those taken from the internet with long file paths for their description, to be truncated and considered as duplicates due to the 35 character limit. This overwrites existing images that have been imported.

    // imgExporter exports inline images to the assets folder and apply src attribute to HTML correctly
    var imgExporter = mammoth.images.inline(function(element) {
        return element.read().then(function(imageBuffer) {
            // Set image file name
            var imgFilename;

            // Use altText for image name
            if (!!element.altText) {
                imgFilename = element.altText;

                // Remove extension in altText if is equal to contentType
                var contentType = 'image/'+path.extname(imgFilename).slice(1);
                if (element.contentType === contentType) {
                    imgFilename = imgFilename.split('.').slice(0, -1).join('.');
                }

                // Shorten if too long
                imgFilename = imgFilename.slice(0, 35).trim();
            }

            // Normalize filename
            imgFilename = normall.filename(imgFilename);

The shortening bit is what overwrites existing files without checking. The default name selection seems to take that into consideration by adding an increment to the name.

            // Or use default name -> img-NN.ext
            if (!imgFilename) {
                imgFilename = 'img-'+imgCounter;
                imgCounter++;
            }

Is the 35 character limit something imposed by mammoth.js and reused here for consistency, or is this something that could be removed?