Indicia-Team / warehouse

GNU General Public License v3.0
5 stars 3 forks source link

Support for Creative Commons and other record licencing options #15

Open johnvanbreda opened 8 years ago

johnvanbreda commented 8 years ago

Currently, Indicia stores no information about the license associated with a record. It is implicit in the survey dataset how the data is to be managed and licensed.

Requirements:

  1. There needs to be a way of allowing users to select a license to be associated with their records in their user profile.
  2. The list of licenses available needs to be stored on the warehouse and it should be possible to define which licenses are available on each website.
  3. A license stored on the warehouse needs a full title, abbreviated code and link to the license description page.
  4. Specific to the Consultants Portal or other sites managing commercial projects - a project (recording group or activity) should be able to force the license for records captured to a license chosen by the administrators. If so, then when adding records to the project/recording group/activity if the license is different to the user's chosen one, then a warning is shown on the record entry page.
  5. Specific to the Consultants Portal or other sites managing commercial projects - If there is a parent/child relationship between groups (e.g. consultancies->projects) then the child project should inherit the parent's license when it is being set up.
  6. The license code should be shown on record details pages with a link to the license. We should adopt standard ways of representing the license: https://wiki.creativecommons.org/wiki/Marking_your_work_with_a_CC_license.
  7. The license code should be included in downloads.
  8. On login, if you haven't chosen a default license for your account, then prompt you to do so. This will warn the user that it will update existing records for their account (excluding records that have a license forced by the project).
  9. When a recorder changes the license on their account having selected one previously, existing records are NOT updated. Once a record has been licensed, that remains fixed.
  10. Specific to the Consultants Portal or other sites managing commercial projects - likewise, a project/recording group/activity admin could change the license on a project but only for new records - existing records would not be updated.
  11. There is no requirement to control licensing on a per-survey dataset level, since these are not commercial entities and therefore the user should be able to select their own license.
  12. Images attached to records adopt the license of the parent record.
  13. For new registrations, a system can be configured to give the user a default license which they can change.

Questions:

JimBacon commented 8 years ago

Crikey, I thought Indicia was part of a drive towards open access to biological records. Where has this requirement come from?

DavidRoy commented 8 years ago

It still is. This requirement is to give greater clarify (and freedom) by aligning with Creative Commons - with a default of CC (or CC-BY). Currently, it's not clear that we can supply images to others for example (e.g. to Atlas of Living x,y,z).

DavidRoy commented 8 years ago

Some comments from me:

  1. Can we set a default license, with no warning. This is equivalent to agreeing to T&Cs?

Response to questions:

  1. Should extend to images and be the same license as for the rest of the record. If users want to protect images they should not include them with records in my view.
  2. I assume this is needed as how would licenses be set for surveys that don't have activities/groups etc
  3. Original license applies, as with other retrospective changes to license option?
johnvanbreda commented 8 years ago

"Can we set a default license, with no warning. This is equivalent to agreeing to T&Cs?"

Licenses for surveys - don't forget that the user's default license would apply. In most cases I think it is the recorders' prerogative to choose the license (especially if we are only allowing CC variants). It's probably only in the commercial world where you are being paid by an organisation that the organisation has some right to select the appropriate license.

johnvanbreda commented 8 years ago

I've added an extra question to the list about the meaning of the term commercial with respect to LRCs.

If we use an "original license applies" approach, then we get an inconsistency. 2 scenarios:

  1. Recorder has entered a load of records for project X. Recorder sets the license to A. Then the project X admin sets the license to B.
  2. Recorder has entered a load of records for project X. The project X admin sets the license to B. Then the recorder sets the license to A.

These 2 scenarios which are only differentiated by timing result in the records having a different license.

DavidRoy commented 8 years ago

"Can we set a default license, with no warning. This is equivalent to agreeing to T&Cs?" I didn't read #8 closely enough! I meant for new registrations only but I see that you refer to 'login'. I agree about not forcing a change on recorders' existing data.

Licenses for surveys - good point. In that case, no need for Survey level?

JimBacon commented 8 years ago

Why does it need all this complexity to clear up what you say is a lack of clarity? Why not just improve the wording of the terms and conditions?

DavidRoy commented 8 years ago

How would we deal with the legacy issue/manage a change in T&Cs?

johnvanbreda commented 8 years ago

Jim, there is an expectation that licensing will follow open standards - this has to be a good thing since knowing that a record is CC or CC-BY for example is far easier to communicate than a page of T&Cs, which might be different per recording website. At the moment there is no association between the T&Cs on iRecord and the resulting records on the NBN Gateway.

We could simply declare all records to be a particular variant of CC, state this in the T&Cs, then attach this to all records going to the Gateway, but this is not what many users have come to expect, having had the option to use a choice of CC variants on sites like Flickr for example.

I think in general, for nearly all Indicia projects, the scenario is fairly simple.

The complexity comes in for systems that manage commercial projects, i.e. the Consultants Portal, where the project manager might dictate the license for a project to meet client needs.

johnvanbreda commented 8 years ago

David - a change in T&Cs might not affect the license - in which case this is an existing issue but there are Drupal tools for asking people to reconfirm acceptance if this does come up. If it does affect the license and we need to force a license change on existing records, then I would argue that this is a very unusual circumstance and one which will need to be communicated to the recorders and would probably result in some manual queries as well as broadcasting of the change to other data users.

JimBacon commented 8 years ago
  1. "There is an expectation that licensing will follow open standards". Whose expectation is that? Is this flowing down from the NBN strategy? Is that the starting point for this requirement?
  2. Open standards may be good but you can have open standards for closed data. This looks at first glance like a way of closing things down rather than opening them up.
  3. At the moment the iRecord T&Cs say "Verified observations will be made available via the NBN Gateway. If you do not want your observations to be made available in this way, please do not submit them. Once verified records have been made available, their use will be governed by the NBN Gateway Terms and Conditions"
  4. Indicia is not Flickr. The aims of biological recording are environmental and scientific. I don't think there is a need, expectation or benefit in allowing e.g iRecord users to choose to restrict the utility of their records.
  5. How is a commercial project manager choosing a licence different from any other user choosing a licence?
  6. Do we need to deal with legacy data?

Other options to consider.

  1. Single licence determined at warehouse level with no legacy data. Set up a new warehouse. All websites that use it agree to the licence and require all users to agree to it. No development work required.
  2. Single licence determined at warehouse level with legacy data. Get all websites to agree to licence. At login to any website users agree to licence for future records and, optionally, past records for this and all other websites. A date of effect stored against the user determines which records are licensed.
  3. Single licence determined at website level with legacy data. At login to a participating website users agree to licence for future records and, optionally, past records for this website alone. A date of effect stored against the users_websites table determines which records are licensed.
  4. Multiple licences determined at website level with legacy data. As 3 but a licence field is also needed in the websites table to determine the licence used by that website

I could go on with more but my point is to suggest there are simpler possibilities.

johnvanbreda commented 8 years ago
  1. The NBN have asked me to implement user selection of CC licenses as I believe this is needed to fit in with Atlas of Living x. I have also had a similar requirement suggested from elsewhere, e.g. Luxembourg MNHN. GBIF are also working towards a similar goal.
  2. I don't think that is correct. This is a mechanism for providing clarity over the license associated with a record. Records already have some form of implicit license (via the T&Cs) but clarity is poor. A license can make a record more or less open - that is not really anything to do with the requirement, although note that it is the intention here to encourage the use of open licensing not closed. Also a license is not just a reflection of record "openness" but also a request for attribution.
  3. iRecord is currently using something akin to a CC-NC license (considering that LRCs are not commercial in the iRecord definition). The Gateway is also using something akin to a CC-NC license. Therefore by attaching a proper CC license to each record we are not closing the record to any existing use, we are simply making the license situation 100% clear. Allowing the user to request attribution in their license is not really closing the record. The only other option they will have is to remove the NC component of the license, thus opening it up not restricting its utility. Forcing all records to allow commercial usage might alienate various users including some record centres.
  4. Having a project manager choose the license causes some issues in terms of resolving potential disputes etc, but the end result is exactly the same.

I agree there are simpler ways to attach a CC license to the records, however any approach we take here should be sensitive to the wishes of both recorders and record centres, so a modicum of flexibility and choice may be hard to avoid. As the copyright holder of the record it seems logical that the recorder should be the primary decision maker with respect to licensing.

But of course, I'm just the developer putting the other side of the argument - simple is always good, and it's up to David and Ella to decide which way to proceed.

JimBacon commented 8 years ago

Ah well, you have offered what has been asked for then. Seems a lot of work to overcome some out of date T&Cs and a perceived need for individual choice, especially if the only choices are CC-BY and CC-BY-NC. (I'm not sure there is a CC without attribution.)

The recorder is always the primary decision maker, even if the licence is chosen for them, because they can always choose not to submit a record.

DavidRoy commented 8 years ago

Jim is right to push for simplicity, but I can't see an alternative if we want to change the license to NC-BY for legacy data (desirable) and give users an opt-out (essential). But if the development costs are prohibitive, we'll have to think again.

JimBacon commented 8 years ago

I think one of my alternatives would have achieved what your are asking for in that comment, David. Obviously your brevity has understated the requirement. I'll shut up and get on with what I am supposed to be doing.

johnvanbreda commented 8 years ago

Please don't (shut up) - it never hurts to thrash things out from another perspective.

BUT, I don't think that the difference in complexity between the 2 solutions is that huge. Ignoring the Consultants Portal specific items, here are the list of requirements again with the alternative approach described alongside:

  1. There needs to be a way of allowing users to select a license to be associated with their records in their user profile. Slightly simpler - there needs to be a way of allowing a user to turn on the proposed license in their user profile
  2. The list of licenses available needs to be stored on the warehouse and it should be possible to define which licenses are available on each website. Not required, but adding a licenses entity and a config screen in the warehouse is a fairly simple task
  3. A license stored on the warehouse needs a full title, abbreviated code and link to the license description page. *As above."
  4. n/a
  5. n/a
  6. The license code should be shown on record details pages with a link to the license. We should adopt standard ways of representing the license: https://wiki.creativecommons.org/wiki/Marking_your_work_with_a_CC_license. Same requirement, update to report queries
  7. The license code should be included in downloads. Same requirement
  8. On login, if you haven't chosen a default license for your account, then prompt you to do so. This will warn the user that it will update existing records for their account (excluding records that have a license forced by the project). Roughly the same requirement
  9. When a recorder changes the license on their account having selected one previously, existing records are NOT updated. Once a record has been licensed, that remains fixed. Not applicable, but this does not describe additional code anyway
  10. n/a
  11. There is no requirement to control licensing on a per-survey dataset level, since these are not commercial entities and therefore the user should be able to select their own license. No code required either way
  12. Images attached to records adopt the license of the parent record. Same requirement - but no code required anyway
  13. For new registrations, a system can be configured to give the user a default license which they can change. Requirement not applicable to simplified version

So it's only 1, 2 and 13 that are different. 1&2 are fairly trivial (adding up to no more than 2-3 hours effort for the difference) and 13 is a would-be-nice anyway. For the full version of this, excluding Consultants Portal specifics, I think this is approx 1.5 days work.

JimBacon commented 8 years ago

Okay, back again. I am thinking about it from the perspective of how the database has to change along with corresponding queries. Also what the impact is on the client websites and the user.

I'm using your premise that the licence is the users choice rather than a survey or website.

  1. The users-websites table needs an additional field for licence. The user has to decide for each website which licence they wish to use. Simpler. The user table needs an additional field. The user makes a single decision about which licence to use across all websites.
  2. Adds a licences table and a licences_websites table. Every website administrator has to choose their licences to offer and every client website has to be modified to present these options. Simpler. An update to iRecord alone would provide a mechanism to update records across all websites.
  3. Adds a licence field to the occurrence table and updates all occurrences by that user. Simpler. No change.
  4. Every client website has to provide a means to allow a change of licence and update the warehouse with the change of preference. Simpler. No change.
  5. Terms and conditions of all websites should be updated in both cases.

Additionally, when submitting new records, the currently selected licence has to be looked up and stored against the occurrence. Simpler. No change.

In both cases the licence may be a field worth adding to cache_occurrences. This would be a licence_id requiring an additional join when reporting. Simpler. A boolean with no join.

johnvanbreda commented 8 years ago

I'm not quite sure I understand. I think you are saying that a simpler version of my proposal is to allow the user to choose but that their choice would apply to all their records on the warehouse, not on a per-website basis. If so, then apart from the move of the license field from the users table to the users_websites table, I don't see this as a significant change in terms of complexity. In response to your points above:

  1. I could imagine a single decision being a point of contention. If users log on to iRecord then CC license their records, they might be surprised to find that their Moth Night records have also been CC licensed (and Atropos might not like this either). We would have to extend the complexity here to ensure that the user is made aware of all websites that they have contributed to which this will affect.
  2. We still need a licenses table (unless we ignore the NBN requirement for a choice of licenses). We could simplify it by removing the licenses_websites table for either version of the proposed change, if we accept that the full list of licenses added to the warehouse are available for all websites.
  3. Are you saying that nothing needs to be added to the occurrences table in the simpler case? If a user toggles between licenses, then we need to capture the state at the point of each record being entered, otherwise we are allowing users to change the license on records already in the wild and we'd have problems disseminating these changes. So I think having an occurrences.licence_id field is required either way (or occurrences.license_enabled boolean if we allow only 1 license).
  4. As for number 1, I think the licensing options are better presented in the user profile of all participating websites, as users are not often aware that all the different sites they use share the same warehouse (and therefore license).

I really don't see this as a huge change in complexity and it would be a shame not to do it right.

JimBacon commented 8 years ago

You understood my meaning correctly. Your proposal is that a website should first select a list of licences to offer and then a user is free to keep switching back and between licences on a per-website basis.

By way of illustrating the opposite end of the spectrum, my suggestion is that there is a single licence offered and it is solely the users choice whether to accept it and that it is a one-way decision (optionally including legacy records).

  1. Agree.
  2. David has not expressed a requirement for a choice of licences.
  3. Nothing needs adding to the occurrences table in the simpler case because there is no toggling between licences. There is just a date from which a user has licensed records.
  4. You are probably right but will there be a problem getting all those client websites to implement the necessary changes? Although perhaps we could easily get them to link to a warehouse page which allowed users to manage their licences in one place (like notifications). That would probably need users to authenticate against the warehouse though.

Given you think the more complex version is only a day and a half to implement that doesn't sound bad at all. Just because the warehouse supports all manner of possibility doesn't mean it has to be employed.

Does your estimate include changes to the client side as well?

johnvanbreda commented 8 years ago

For point 2, this requirement came from Ella Vogel at the NBN, not David. For point 4, websites will only come on board when they choose to do so. They will need some new code and perhaps configuration (though we could default this). However I don't think its going to be considered well-mannered to change the license of records on a website without some change to that website to make it clear to the user, so I think having to change each website is inevitable.

My estimate did include changes to the client side, though I'd only do Drupal 7 (perhaps covering D8 from different funding, and ignoring D6).

EllaVogel commented 8 years ago

Hi all, thanks for the interesting conversation. Apologies for my lack of contribution until now.

It feels important to go back to the very first comment. It seems that John has very clearly captured what needs to be done in order to assign licences to all records and I think it is important that we don’t lose sight of this.

Jim, for way of a little background to the need to assign data licenses...
Over the last 12 months the NBN Secretariat has undertaken a review of data licensing on the NBN Gateway. This was done at a number of workshops and through questionnaires completed by NBN members and Data Partners. The focus of this has been to improve options for data sharing and facilitate increased data use. As David said, having a clear suite of licenses ultimately opens the use of data, as users can clearly identify which data they can use and which they need to avoid or seek express permission to use. The current Terms and Conditions on the Gateway, with ability to add additional constraints as set by the data provider, make it nearly impossible for someone to use an aggregation of datasets as each may have a different ‘bespoke’ license.

One of the main requests from NBN Data Partners has been the ability to assign a data license to their own datasets. The NBN Gateway has now been changed to allow this and four data licenses are available. Data partners can now give their datasets a license via the dataset metadata page on the NBN Gateway. You can read more about this stream of work here: http://www.nbn.org.uk/News/Latest-news/Data-Licensing-on-the-NBN-Gateway.aspx#sthash.aniv5UBG.dpuf

The data license options currently available on the NBN Gateway are Open Government License (OGL), CC0, CC-BY, CC-BY-NC. If there is a requirement to add further licenses we can look into this in due course. These licenses will also be available, as John said, on the Atlas platforms.

For the two questions posed by John: 1) If a project admin sets the license on an existing project, then what happens to existing records that are already licensed by the recorders? The licence should stay as it was originally assigned. I don’t think we should be retrospectively changing licences.

2) Do we need to define where a record centre sits in respect to the term "commercial"? I could imagine wanting a CC-BY-NC (*but allowing access to record centres) license. Yes this is something we will be looking over the next few weeks. To date there has been no objection to the four chosen data licenses on the NBN Gateway and the license particulars are all available for users to read and give due consideration to before assigning a given license. However, perhaps we do need to develop a LERC license – I will discuss this further with the rest of the NBN Secretariat.

Are there any main points here that I have missed and need addressing? I’m conscious that we still need to discuss resourcing this as it sounds like there are elements that reach further than the Consultants Portal.

JimBacon commented 8 years ago

Hi Ella,

Thanks for the info. My initial response was one of surprise because it appeared in the issue queue without me being aware of the background. John has convinced me that his solution is reasonable because it allows full flexibility with very little development effort.

However, I am not yet clear that this full flexibility, putting the licence choice in the recorders' hands and allowing them to keep changing it, is necessary or desirable.

johnvanbreda commented 8 years ago

Ok, so I think we are now a bit clearer where we are coming from. Trying to summarise a bit - from the perspective of iRecord, I think Jim's solution could meet the NBN requirements. We could have a single flag in the user profile which the user can tick to "accept CC-BY" licensing. Once ticked it cannot be unticked. All reports that need to grab the license state of a record would join to the users table (or possibly users_websites if it is done on a per-website basis). We could do this in the cache building to minimise the effect on reporting performance. Therefore the iRecord dataset on the Gateway could be split into a dataset with the current unclear license, plus a second dataset with CC BY licensing. Positives of this approach are its simplicity. Negatives are:

The alternative approach is to provide a configuration table in the warehouse which lists available licenses (OGL, variants of CC etc). The user can select a license and store it in their profile. This license is then assigned to all records that they enter going forward (and optionally all prior unlicensed records). They can change license at any point but only for records going forward.

The negatives of this approach are:

Positives of this approach are:

My opinion is that this requirement is all about opening records up. Therefore if we feel that by offering a wider choice to users than a single license option we'll get more open records, then we need to factor in choice.

JimBacon commented 8 years ago

Hi John. As I said, you've won me round to your way of implementing it on the warehouse since you could do it in less time than we have been talking about it.

It would then be down to different websites to decide on the offering they would make to their users with, I think, every permutation being a possibility from the user changing their licence every day to the user making a one-off choice to accept a single licence.

DavidRoy commented 8 years ago

I'm happy to go with John's suggestion as the extra complexity is required. Thanks

EllaVogel commented 8 years ago

From a Consultants Portal point of view, I am happy that we have decided that an option of licenses is the best way forward.

The infrastructure will then be there for other platforms, such as iRecord, to go down the same route if they wish, or to provide a simpler one-option license.

Thanks for everyone's input.

JimBacon commented 8 years ago

I am just reading the Atlas of Living Scotland Terms of Use (http://www1.als.scot/terms-of-use/). It says "Note in some cases Content may be in the public domain, in the sense that it is not subject to copyright protection because it does not qualify for copyright, eg individual species sightings"

That explains why, on their recording form, the licence selection only refers to uploaded images.

Do we disagree with this statement?

EllaVogel commented 8 years ago

Thanks for raising this Jim. For the ALA there’s a licence associated with the dataset of records (Creative Commons Attribution 3.0).

http://collections.ala.org.au/public/show/dr364

We haven’t specified this in Atlas of Living Scotland yet but will be making it clearer on the page that records submitted through this route will be assigned such a license.

Each individual image associated with a record do have a separate licence.

Any more feedback on the site would be appreciated.

JimBacon commented 8 years ago

Having a licence on a dataset is consistent with what is being done on the NBN gateway. http://nbn.org.uk/News/Latest-news/Data-Licensing-on-the-NBN-Gateway.aspx)

I can see the proposal put forward here for Indicia, of allowing users to choose licences per record (assuming this is valid) could come in to conflict with licensing at the dataset level.

Say I submit a ladybird record to iRecord with the Ladybird mobile app. In the new world of user selected licensing, I pick CC_BY_NC. In the meantime, let us imagine that the Ladybird Recording Scheme has taken up the new licensing options offered by the NBN and is applying CC0 to its dataset.

This effectively prohibits my record from being added to the Ladybird Recording Scheme dataset because the licence I have chosen is more restrictive than the licence chosen by the dataset administrator.

The undesirable consequences are that either my record cannot be added to the LRS data set or the LRS administrator opts for the most restrictive licence to allow inclusion of all available records.

johnvanbreda commented 8 years ago

Would appreciate thoughts from David and Ella before I continue on this development (which I've already started). As Jim says, having a licence selectable by the user seems at first glance to be incompatible with the dataset level licensing of the Gateway and ALA. Some possibilities:

The first option here is more or less exactly the same as the proposed development but where the dataset administrator only enables a single licence option. Therefore we could complete the current development without losing this possibility. However although this option matches the expectation of licencing on the ALA, it does possibly mean that some records will be lost if the recorder disagrees with the dataset administrator's choice.

EllaVogel commented 8 years ago

Thanks for raising this as an issue. I think that as far as the Consultants Portal goes, the licence is set at project level by whoever sets up the project. A large majority of the cases on the Consultants Portal will be that people contributing records to one project will all be from the same consultancy, so will be abiding by the rules set by their organisation.

If there is an instance where the user disagrees with the licence set for the project, I would assume that that is something that would be worked out by the consultancy and their consultants. When ad hoc records are added, the user can select a licence for the specific record, but will have to re-select each time they add individual records.

I therefore think that the best course of action here is to, as has been suggested, not give a licence option when adding to a project already set up. All records added to a project will agree with the licence that is assigned to the project. For all instances of records being added not under a project, the user will chose their own licence.

We have not got a big budget to be spending on these developments, and it is not our intention to be developing the whole system here, but merely developing the Consultants Portal so that it can manage data licences and can subsequently allow these licences to stay with the record through to the NBN Gateway. Hopefully, what is being developed here can act as a basis for developing the iRecord system and other systems when time and budgets allow, but, unfortunately, for now we have to focus only on what will benefit the Consultants Portal.

I hope that I have interpreted this issue correctly and have provided a useful response.

DavidRoy commented 8 years ago

John. Given Ella's sensible re-iteration of the background to this, I suggest you implement the solution for the Consultant's portal.

We can then review how we implement this for iRecord once budget become available. One option is:

  1. We set a default licence for iRecord. CC-BY or CC0
  2. Individual users can set their own licence which differs from the default, but we advise against doing this
  3. We export data to the NBN Gateway/ALS based on dividing up records into CC0, CC-BY, CC-NC etc
johnvanbreda commented 8 years ago

I have now implemented enough for this to be used on the Constultants Portal. From the original requirements list, this covers:

  1. The list of licenses available needs to be stored on the warehouse and it should be possible to define which licenses are available on each website.
  2. A license stored on the warehouse needs a full title, abbreviated code and link to the license description page.
  3. Specific to the Consultants Portal or other sites managing commercial projects - a project (recording group or activity) should be able to force the license for records captured to a license chosen by the administrators. If so, then when adding records to the project/recording group/activity if the license is different to the user's chosen one, then a warning is shown on the record entry page.
  4. The license code should be shown on record details pages with a link to the license. We should adopt standard ways of representing the license: https://wiki.creativecommons.org/wiki/Marking_your_work_with_a_CC_license.
  5. The license code should be included in downloads.
  6. When a recorder changes the license on their account having selected one previously, existing records are NOT updated. Once a record has been licensed, that remains fixed.
  7. Specific to the Consultants Portal or other sites managing commercial projects - likewise, a project/recording group/activity admin could change the license on a project but only for new records - existing records would not be updated.
  8. There is no requirement to control licensing on a per-survey dataset level, since these are not commercial entities and therefore the user should be able to select their own license.
  9. Images attached to records adopt the license of the parent record.

I also have code (not yet commited) for adding a licence selection control to the user profile editing form.

johnvanbreda commented 3 years ago

Closing as now implemented.