Open GoogleCodeExporter opened 9 years ago
On investigation, the natural language generation aspect of this does not
appear too tricky. The main difficulty will be the performance overhead of
obtaining the appropriate summary data from the database on the fly - any delay
to submission would negate the positive effect of this modification.
Original comment by johnvanb...@gmail.com
on 18 Nov 2013 at 9:48
David Roy's comment:
I had a thought about messages generated when records submitted - could they be
carried forward to verification system and pre-populate the email message for
verifiers when they select 'email recorders'? Or does this create a longer
term storage problem?
My thoughts:
I suspect that storing the messages generated might require a fair bit of disk
space, but it is not going to go into an area of the db that it would be
important to keep loaded in memory unlike the occurrences cache table.
Therefore I don't think that storing the messages is a particular problem.
Original comment by johnvanb...@gmail.com
on 19 Nov 2013 at 9:15
The trick is going to be working out how to extract all the summary data from
the database to support the generation of these statements quickly enough to
not have a detrimental effect on record submission. I think it might be a good
idea to give the existing “thanks” message, with an extra link “find out
more about your record...”. When clicked, this can go to the database and
generate the language response, so a slight delay will be acceptable. This also
has the advantage of not stating the obvious to seasoned recorders all the time.
Comment from Peter Brown:
That sounds like a very good suggestion (i.e. to have this as a two-stage
process). And Helen I like your choice of wording below. It'll be excellent to
give recorders the opportunity to learn so much more about their record (if
they want to...) and is bound to encourage further recording.
Original comment by johnvanb...@gmail.com
on 19 Nov 2013 at 6:49
I just got back from a conference which was largely about natural language
generation so I thought I might give my 2 cents.
The examples you give are interesting and likely going to interest recorders
(or at least those new to recording). However they don't make to most of the
potential of NGL. We should aim not just to interest the recorder but to
educate them, focus them and challenge them so that instead of getting more
records, we get more records of higher quality.
One example from the conference was BeeWatch
(http://homepages.abdn.ac.uk/wpn003/beewatch/index.php?r=user/auth). Here, a
part of the response message reads something like:
'Your submitted your record as <species a> but it is in fact <species b>. You
correctly identified <trait 1>, <trait 2> and <trait 3> which are shared by
these species. The traits you need to look out for to distinguish these are
<trait 4> and <trait 5>. For <trait 4> <species a>, is <attribute 4a> and for
<species b> it is <attribute 4b>. As for <trait 5>, <species a> is <attribute
5a> and <species b> is <attribute 5b>'
The reserchers showed that this (as opposed to just a thank you message)
resulted in a significant improvement in ID skill and a big improvement in
volunteer retention. You also mention issues of extracting data to fill in the
blanks. Using this method requires very little data, simply a table of traits
for each species in your group of interest (this would be a big hurdle for some
groups e.g. Diptera, but easy for others e.g. Ladybirds).
While this improves ID skills we can also improve spatial coverage by
motivating recorders to record where we need it most:
'Thanks for your record of <species a> form <location>. It is likely that this
species is also in <other location> (<link to map of other location>), but we
have not got any records from <other location>. If you are able to send in a
record of <species a> from <other location> that would really help our
research.'
'Thanks for your record of <species a> form <location>. We also think <species
b> (<link to info on species b>) is likely to be in <location> so please keep
an eye out for it next time your out.'
Some people like a challenge, or positive reinforcement, so you could think of
messages like:
'You just submitted your longest list: <length>. Complete lists, where you
record everything you see, are great for answering research questions, thanks!'
'You just recorded you <milestone number> species, congratulations! Here is to
the next <milestone number>'
'You record for <location> is important because <location> is poorly recorded
(it is in the bottom <location percentile>). Recording in these areas really
helps improve the quality of our data'
I think that there are definitely technical challenges to do this type of work
but I would argue that just as difficult will be designing responses that make
the most of the opportunity NGL affords us.
I should also say that I spoke to the guys at Aberdeen (from the computer
sciences department) who are behind a lot of their NLG work and they are keen
to foster collaborations, and this might be best achieved by taking on one of
their masters students whose projects start in January.
Original comment by tomaugus...@googlemail.com
on 13 Jun 2014 at 9:15
If you develop this, there must be an opt out button. I really appreciate
getting messages from verifiers, but would definitely not want to receive
automated messages.
Be wary of including statements like 'this is a new county record' or this is a
new 10km' record, because presumably this would only be based on the data held
on the NBN database and Indicia warehouse. There have been recent instances of
recorders publicly claiming they've got 'firsts' because there are no records
on the NBN Gateway, only to have it pointed out to them that there are records
in the literature.
Original comment by PaulaNBN@gmail.com
on 5 Sep 2014 at 6:38
Original issue reported on code.google.com by
johnvanb...@gmail.com
on 18 Nov 2013 at 9:46