KurtCode / PDFKitten

A framework for extracting data from PDFs in iOS
MIT License
391 stars 113 forks source link

CGPDFStringRef to NSString * #40

Open Ismael-Schellemberg opened 12 years ago

Ismael-Schellemberg commented 12 years ago

Hi, i'm working on a project and I need to be able to highlight parts of the text by location and not by match, so I took your project and slightly modified it so that [scanner selections] returns every single character frame instead of wherever it matches the keyword

the change was fairly simple and it works like a charm, however i did found a "bug", and it's that some CGPDFStringRef's are wrongly converted (this happens on the pdf downloaded from here)

When the scanner starts, it reads the first "A" (from "A cat in his...") and gets an error when converting it

- (NSString *)stringWithCode:(int)code
{
    static NSString *singleUnicodeCharFormat = @"%C";
    NSString *characterName = [names objectForKey:[NSNumber numberWithInt:code]];
    unichar unicodeValue = [FontFile characterByName:characterName];
    return [NSString stringWithFormat:singleUnicodeCharFormat, unicodeValue];
}

unicodeValue is 0, so when it creates the return value, it's an incorrect value

this happens with about 40% of the characters found in that PDF i tried using CGPDFStringCopyTextString like this:

CFStringRef cfStr = CGPDFStringCopyTextString(string);
NSString *cidString = [NSString stringWithString:(NSString *)cfStr];
NSString *unicodeString = [[NSString stringWithString:(NSString *)cfStr] lowercaseString];
CFRelease(cfStr);

and all the characters are converted correctly

is there a reason I should be using your method? or should I (and possibly you too) use the CGPDFStringCopyTextString function?

if i can get in contact with you i could provide you with further detail / screenshots

anyways, thanks for the great work you've done :)

KurtCode commented 11 years ago

Hi, thanks for your input. Sorry it has taken me this long to answer.

Anyways, the first method finds ligatures which CGPDFStringCopyTextString wont.

rayray commented 11 years ago

@Ismael-Schellemberg Where in the scanner did you insert your CGPDFStringCopyTextString implementation?