Closed impca closed 8 years ago
I'm not sure that's the expected behavior :confused:
Contrary to other extensions in the white list, PDFs have a title; I'd expect hubot-url-title
to be able to extract it.
I'm still traveling but ignoring PDFs seems reasonable until we have the feature @pchaigno talks about
@pchaigno are you familiar with the PDF format? do they in most cases have something equivalent to the HTML <title>
tag?
I'm not sure all PDF documents have a title. In any case, I think mozilla/pdf.js could extract it (via PDFDocument.documentInfo
).
I'd expect hubot-url-title
to try to extract a title as much as possible but it might not be within the scope of this package ;)
I think it would be a cool feature :)
That is an interesting idea.
I am kinda torn between implementing it here or creating a separate hubot-pdf-title.
I am kinda torn between implementing it here or creating a separate hubot-pdf-title.
I think it would be better to implement it here, because if you create a separate package, you'll end up re-developing some of the same logic :/
Let's ignore PDFs for now, as we can't get the title from them. If we do #14, we will just remove the ignore.
Added pdf to a list of ignored files. Hubot will no longer try to get a title of pdfs.