UniversalDataTool / universal-data-tool

Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
https://universaldatatool.com
MIT License
1.93k stars 191 forks source link

Active Learning #148

Open seveibar opened 4 years ago

seveibar commented 4 years ago

This is for tracking interest in Active Learning, which is automatically annotating on new labels based on labels that have been done so far. Please :+1: if this is something you'd like.

From an implementation perspective, I'm thinking we enable active learning in two scenarios:

Samples that were automatically annotated and not reviewed are marked in a different color.

Curious how people think this could be implemented, please comment with thoughts!

seveibar commented 4 years ago

We may want to piggyback on some big projects working in this space like fast.ai and modAL

david-waterworth commented 4 years ago

This is something I'm interested in. I think as well as the two scenario's you've identified another mode would be that the universal-data-tool acts as a service for an active learning framework such as modAL rather than the other way around. That way you implement the interface that's required by modAL.

I don't know which is best (or even feasible) at this stage - I'm still in the evaluation process myself and I've identified both modAL and universal-data-tool as potential components of a solution. Being a researcher I have additional requirements that I need to be able to experiment with each component so require something modular.

I'd also suggest that whatever the solution you probably want to use queues between say modAL and universal-data-tool for robustness.

seveibar commented 4 years ago

Hey @david-waterworth,

I'm really interested to hear what you're building, because I try to build the product around user projects. Are you open to sharing some more details about how you would use these two UDT and modAL together? Is it an image segmentation application? (Same to anyone else following this thread :)

Definitely agreed on queuing and transparent background process/server process management.

We're hoping to build something for this in the next month or so, so any feedback appreciated!

david-waterworth commented 4 years ago

Hi @seveibar

I'm happy to share. My project is primarily an Entity Recognition and Relationship Extraction problem given a character sequence. The application is converting legacy building cyber-physical sensor metadata to a standard ontology to enable portable applications. Building sensors tend to have human-generated identifiers i.e. "Rm1.ZnT" for the zone temperature sensor located in room 1. So the first stage is to use active learning to extract the tags from the raw name. Ultimately I want to group the tags and learn the relationships - ideally using a single classifier. Currently, this is a multi-stage problem.

I also have a need to classify time series data so I've been investigating time series annotation tools. Basically selecting either events (x occurred at timestamp t) or ranges (x occurred from t1 to t2) given a univariate or multivariate time-series. Also, my colleagues do research on IoT wearable devices and have been looking for something which can annotate multivariate time-series data from an activity given a video of the wearer (i.e. moving left, moving right etc.)

I'm happy to provide feedback as I clarify my requirements and perhaps if I have time will try and contribute directly. I'm experienced with python, I know the basics of react.

seveibar commented 4 years ago

Thanks for the context! Really curious about the time series tool. Any particular tools you're looking at for time-series data? Not totally sure what the interface would look like. Created an issue to discuss more here #242

harsh306 commented 4 years ago

This is for tracking interest in Active Learning, which is automatically annotating on new labels based on labels that have been done so far. Please 👍 if this is something you'd like.

From an implementation perspective, I'm thinking we enable active learning in two scenarios:

  • The user is using the desktop application
  • The user has specified a web server which can do the active learning with an API we define, and have a sample active learning docker container for

Samples that were automatically annotated and not reviewed are marked in a different color.

Curious how people think this could be implemented, please comment with thoughts!

I agree, Active learning could very useful for NER tasks and also image tagging. Pytorch and Tensorflow examples will be of great help.

VitoriaCarvalho commented 3 years ago

Hello! Congratulations on the project, it is an excellent initiative. 👏 I am also interested in the implementation of Active Learning in the tool. The prediction of the entities during the annotations is very useful mainly for those who need annotate a lot of data. I would like to know how the project is progressing in relation to Active Learning. Thanks in advance!