Closed matteosecli closed 6 years ago
Hi matteosecli,
Many thanks for the work.
Just to clarify, by 'author' I think you meant the author that created the highlights, not the article? In that case, I've already retrieved author in the getUserName()
function in menotexport.py
, and the author
field is then passed onto the meta
attribute associated with extracted annotations and can be accessed from there easily. So no need to do that again in the queries you modified.
I haven't tested, but I think the only change needed is something like this in exportPdf()
:
anno=pdfannotation.createHighlight(hjj['rect'], cdate=hjj['cdate'], color=hjj['color'], author=annotations.meta['user_name'])
and similar to the createNote()
part.
Would you like to give it a try and see that does what you want?
Hi @Xunius, sorry for the (very) late reply; I was waiting for a friend of mine to help me carrying out some tests.
I've set up a collaborative folder in Mendeley in order to test what happens when multiple people annotate the same document; I'll go step by step.
Just to clarify, by 'author' I think you meant the author that created the highlights, not the article?
Yes, by 'author' I mean the Mendeley user that created the highlight, not the author of the article.
In that case, I've already retrieved author in the
getUserName()
function inmenotexport.py
, and theauthor
field is then passed onto themeta
attribute associated with extracted annotations and can be accessed from there easily.
I have to say that I actually missed that! However, there are a couple of objections:
getUserName()
needs a fix, because it has to look for the Profile which also has the attribute isSelf
set to true
. In fact, when you collaborate with other Mendeley users, your local database also contains their profile information in the Profiles
table; you, the owner of the account in use, are identified only by that attribute set to true
. The current function, instead, assumes that the owner of the account is the first person that appears in the Profiles
table, which is not true. Indeed, upon different annotate->save->exit operations in Mendeley, my friend was listed first in the table and therefore was wrongly assumed to be the user by Menotexport
itself. But this would be the subject of another pull request.getUserName()
is the correct one, this doesn't mean at all that he's also the author of all the notes in a document! Indeed, I've tried with one of these test documents annotated by multiple people; with your suggestion, all the notes appear as created by the same person. With the edits I've suggested, each note is correctly coupled to the Mendeley user that created it. (btw: if you don't have any collaborative documents and you want to join the collaborative test folder, just let me know)As a final note, I've realized that if firstName
or lastName
were NULL
, the script was crashing on the lines that join them – the ones that I've added. So, I've slightly modified the joining procedure in order to skip these NULL
fields – and so far in my tests, it seems to work.
Hi matteosecli,
Many thanks for coming back and all the work. I've merged your PR #28.
I didn't do collaboration in Mendeley so that's a use case that has been largely neglected, and I think you made a valid point that the highlight authors (and note author as well, right?) would be different in that use case. But the thing is, I thought you won't be coming back so I've made quite some changes in the code (to address the multiple attachment issue) and now the getHighlights()
, getNotes()
and some other functions look bit different. To be honest I'm not quite experienced in handling conflicting merges, I believe your changes will be based on an old base should I commit my changes. So, do you think it would be better to let me finish my multi-attachment issue fix and incorporate your changes myself (to save your time), or you wait for my commit and do your changes again (so we retain your credits)?
Thanks again for the contribution.
By multiple attachment issue you are referring to https://github.com/Xunius/Menotexport/issues/26, right?
Anyway, for me it's ok either way! I don't know the extent of the new changes; I think this PR can wait until you commit all the new changes related to that issue and then see if has conflicts or not at that point. If it has conflicts which are better resolved by rewriting these changes from scratch and you think it would be more efficient for you to directly incorporate them, it's totally fine for me! Or if you don't have time I can make a new PR as well. 😉
So, I'd say to check back here once you commit those changes.
In the meantime, I'm looking into something else – which I hope I can report in a new thread for a discussion as soon as I have some minimal working examples/data.
Hi matteosecli
I've implemented your suggested changes (with minor differences) and pushed. Here is what I did:
In menotexport.py
, getHighlights()
function queries the "author" field associated with each highlight box, and if "author" is empty, use the name from the "profile". You added a filtering before joining the first and last names, but as I already have a getUserName()
function, I added the filtering to getUserName()
, and assigned the return value of getUserName()
to "author" if "author" is emtpy. Similar for the getNotes()
func which fetches sticky notes.
In lib/exportpdf.py
, added the new "author" argument as you suggested.
Would you like to give it a test and see it works as intended?
Hi @Xunius,
I've tested the latest master
version. Now all the highlights have a non-empty author, but the problem is that the author is always myself even if the highlight was added by another person!
I'll try to explain better, maybe I was a bit messy last time. What I was doing was the following:
FileHighlights.author
, Profiles.firstName
and Profiles.lastName
, where Profiles.firstName
and Profiles.lastName
are the fields corresponding to the same profile uuid (Profiles.uuid
) that authored the highlight (which is FileHighlights.profileUuid
).FileHighlights.author
(and maybe it was in older versions), but in my case this field is actually always empty. Each highlight, instead, has a FileHighlights.profileUuid
. So, I first try to set FileHighlights.author
as the author of the highlight; if this is empty (i.e. in my case, always) I then set "Profiles.firstName
Profiles.lastName
" as the author of the highlight, which can be different from username
(which is also the merge of Profiles.firstName
and Profiles.lastName
, but only of the Profile with isSelf=true
; the other profiles, i.e. your co-workers, have other Profile entries with isSelf=false
).So, your lines https://github.com/Xunius/Menotexport/blob/6a8ba455744fb0c4217832980b7d87dd570cb610/menotexport.py#L453-L455 always set myself as the author of every highlight. Instead the change I was proposing, i.e. the lines https://github.com/Xunius/Menotexport/blob/fa6f6448cf3d02858de3b00f463d0c2c609773d5/menotexport.py#L411-L414 set the name linked to the highlight's profileUuid as the correct author of the highlight.
So
FileHighlights.profileUuid
or the fields Profiles.firstName
and Profiles.lastName
in the highlights query.FileHighlights.profileUuid
instead of the fields Profiles.firstName
and Profiles.lastName
. This way, you can also drop the lines LEFT JOIN Profiles ON Profiles.uuid=FileHighlights.profileUuid
in my suggested changes and speed up the query quite a bit.Profiles.uuid
as the keys and ' '.join(filter(None, [Profiles.firstName Profiles.lastName]))
as the values.getHighlights()
function, once the query returns the FileHighlights.profileUuid
for each highlight, assign the corresponding Profile name contained in the dictionary as the highlights author, if FileHighlights.author
is empty.What do you think it would be better to do? If you don't want to rewrite these bits and you are not planning to do any incompatible changes, I can do these modifications, test them, and send them as a PR in the next few days. If instead you prefer to write directly the code, I'm always here to test anyways. 😉
I got your points now, because I never did co-authoring before so I thought that's why my FileHighlights.author
is always empty. It appears that it's indeed necessary to query the Profiles
table, but I wasn't expecting doing this would make it much slower. My profiling suggests the slowest parts are those relating to PDF processes via pypdf2
and pdfminer
, I tried to multi-thread some function calls but only get negligible speed gain (maybe I'm doing something wrong). But your dictionary idea sounds good.
I have some free time tomorrow, so how about me re-doing these changes as we understand each other quite well already, and I'll let you continue on the "Wrong coordinate ordering in "Rectangle Highlight"" fix (https://github.com/Xunius/Menotexport/issues/29) as you have a much better understanding in that regard?
I have some free time tomorrow, so how about me re-doing these changes as we understand each other quite well already, and I'll let you continue on the "Wrong coordinate ordering in "Rectangle Highlight"" fix (#29) as you have a much better understanding in that regard?
That sounds totally fine for me! 😃
Closing since these changes, with the discussion that followed in this thread, were reimplemented by @Xunius in https://github.com/Xunius/Menotexport/commit/6a8ba455744fb0c4217832980b7d87dd570cb610 and https://github.com/Xunius/Menotexport/commit/e6eebeba170356b9a0f31dc22bb7621d2dd93138.
I've noticed that, by opening an exported PDF file with a PDF reader, the
author
field was empty. So I looked into my Mendeley database and I've found that theauthor
field was empty, too; however, there was aprofileUuid
that links to profile information (& that doesn't seem to change even if you change your email address).I've added this information into the query to the database; the author is by default the
author
field in the database; however, if this field is empty, the author's name is constructed by merging thefirstName
andlastName
fields of the database.I've also added an
author
field to the highlights dictionary because it was completely missing, although the relevant functionpdfannotation.createHighlight()
already had all the things in place.I've tested the changes with my database and Mendeley 1.18, although I don't know whether these fields were present or not in much older versions of Mendeley's databases (imho I don't think so, because an account was required even for much older versions; I've checked on Mendeley 1.17.11 and they are there).
I'm not a Python or even more a SQL expert (never used SQL in my life), so I apologize in advance for any mistakes!
PS: The ordering of the new fields looks a bit random, but it's just because I've tried to change the code as little as possible.