astralapp / astral

Organize Your GitHub Stars With Ease
https://astralapp.com
BSD 3-Clause "New" or "Revised" License
3.18k stars 141 forks source link

import existing stars/tags when self-hosting #232

Open vesper8 opened 5 years ago

vesper8 commented 5 years ago

I've been using this app for almost 3 years so I have over 1000 stars and lots of tags. Today I decided to try the self-hosted solution so I could learn the codebase and possibly add improvements and PRs

Everything works fine and I am glad it loaded my existing Astral data. However I noticed it didn't populate my mysql database with all my existing stars and tags

Thanks to the json export I see it would be possible to create an import script to bring in all the existing data. However I wanted to ask if perhaps this is already possible and I'm missing something. Or if it's somehow a bad idea to do this and it will break my self-hosted installation?

syropian commented 5 years ago

Uh, your self-hosted version has all of your tags? Are you sure you aren't confusing them with the language tags? Those are generated automatically and inferred from your list of stars.

As for importing, I haven't created a solution yet, but it's on my radar. Would definitely make the transition smoother for people moving to a self-hosted solution 👍

vesper8 commented 5 years ago

Yes.. @syropian I noticed that it was the language tags.. and then I edited my issue text.. but I guess you saw the original :)

And I just built the import command : ) and it works wonderfully! My coding is probably not up to your standards so I won't create a PR.. but feel free to use this script to add this functionality of course!

usage: php artisan astral:import vesper8_astral_data.json

<?php

namespace Astral\Console\Commands;

use Astral\Models\Tag;
use Astral\Models\Star;
use Astral\Models\User;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Schema;

class ImportCommand extends Command
{
    protected $signature = 'astral:import {jsonFile=astral_data.json}';

    protected $description = 'Import your data from the json export';

    public function handle()
    {
        Schema::disableForeignKeyConstraints();
        Star::truncate();
        Tag::truncate();
        Schema::enableForeignKeyConstraints();

        $path = base_path($this->argument('jsonFile'));
        $content = collect(json_decode(file_get_contents($path), true));

        $user = User::first();

        $content->each(function ($starWithTags) use ($user) {
            $starData = [
                'user_id' => $user->id,
                'repo_id' => $starWithTags['repo_id'],
                'notes' => $starWithTags['notes'],
                'created_at' => $starWithTags['created_at'],
                'updated_at' => $starWithTags['updated_at'],
                'autotagged_by_topic' => $starWithTags['autotagged_by_topic'],
            ];

            DB::table('stars')->updateOrInsert([
                'repo_id' => $starWithTags['repo_id'],
            ], $starData);

            $star = Star::where('repo_id', $starData['repo_id'])->first();

            $tags = collect($starWithTags['tags']);

            $tags->each(function ($tag) use ($user, $star) {
                $tagData = [
                    'user_id' => $user->id,
                    'name' => $tag['name'],
                    'sort_order' => $tag['sort_order'],
                    'created_at' => $tag['created_at'],
                    'updated_at' => $tag['updated_at'],
                    'slug' => $tag['slug'],
                ];

                DB::table('tags')->updateOrInsert([
                    'slug' => $tagData['slug'],
                ], $tagData);

                $tag = Tag::where('slug', $tagData['slug'])->first();

                $starTagData = [
                    'star_id' => $star->id,
                    'tag_id' => $tag->id,
                ];

                DB::table('star_tag')->updateOrInsert($starTagData, $starTagData);
            });
        });
    }
}
vesper8 commented 5 years ago

now I can move on to try and get the "select all" functionality working that I'm craving for.. currently you can only select multiple by clicking one at a time while holding shift.. and you can't unselect without losing all of your selections

Plus.. it seems like tagging multiple selections at once isn't working.. it only tags the first even if you drag multiple on top of a tag. I saw issues that seemed to implement this feature but it doesn't work for me

Would love to be able to multi-select and multi-tag more efficiently!

vesper8 commented 5 years ago

I feel like I could perform a whole lot of automatic tagging based on the title (and possibly description) of the repos.. and this would be useful to me. But in order to do that efficiently it would be best if the title and descriptions were stored in the database.. as opposed to just the repo_id. I understand the title and description can change.. but even with that in mind, I think it would be useful and would allow adding some automatic user-personalized tagging rules. Any thoughts on that?

vesper8 commented 5 years ago

Just discovered this magic 'autotag_topics' system.. couldn't find any way to enable it from the settings or anywhere in the code.. it looks like the only way is to set the value manually in the mysql table.. which I did.. and very neat! It created over 3000 tags from my ~1300 stars.. lots of noise and not quite perfect.. but now I understand I can start from there to add some of my own logic. I still find it very useful however. But now there seems to be a need for a 'tag search' now that I have over 3000 tags. It would also be nice to be able to sort the tags by 'repos tagged'

Is this an experimental feature that is still being developed? I can't find any github issues about it either

syropian commented 5 years ago

Thanks for the auto-import script! I'll definitely take a look.

Auto-tagging has been something I work on, on and off, and I never really end up with a solution that I like. It either creates too much noise, or does so much work I'm always a little bit terrified I'll overwork the DB when its applied to a production environment. It's a tough problem, because everyone would likely want it to work a little bit differently.

I'm not 100% sure what's going on with the multi-select tagging, but I have occasionally seen the same issue. For me, sometimes it works, sometimes it doesn't. I'm hoping to squash some of the issues over the Christmas holidays, where I'll actually have some solid time to dedicate to it.

vesper8 commented 5 years ago

Here is the approach I am planning to take to autotag in a way that I like

Haven't decided on how I will write the logic yet, but this is the result I think will work well.

I want to be able to add tags manually and then mark some of them for auto-tagging. For example I would add 'Laravel' and Vue' as two tags that I want autotagging performed on. So I could have a dozen tags, 10 of which aren't included for auto-tagging and 2 that are.

Then I would like to perform the auto-tagging based on rules such as 'tag, case-insensitive, must be included in the repository title'

This is the simplest form, a more complex rule could be associating multiple keywords to a tag and then saying 'in order for this tag to be applied, all keywords must be included in the title and/or description'

But I think without having to add this extra layer of complexity (Adding keywords to tags) and only making sure the tag itself is included in the title and/or description I could achieve something very satisfactory in a short time.

Only applying the autotagging to existing tags (as opposed to all returned topics from the api) should minimize the noise too.

vesper8 commented 5 years ago

Also if you're worried about doing too many inserts with your current autotagging logic, a valid concern IMO, then you could look into implementing a bulk insertOrUpdate. I use this in a lot of my projects where I tend to bulk insert 500 rows at a time. It's nice because it also correctly handles duplicate constraints

This is my QueryBuilder trait that I add to my models to be able to do bulk insertOrUpdates

<?php

namespace App\Traits;

use DB;

trait QueryBuilder
{
    protected $columnString;
    protected $updateString;

    /**
     * Must call this before calling insertOrUpdate
     * @param array $columns
     * @param array $columnsToNotUpdateOnDuplicate
     */
    public function setColumns(array $columns, array $columnsToNotUpdateOnDuplicate=[])
    {
        $this->columnString = $this->buildColumnString($columns);
        $this->updateString = $this->buildUpdateString($columns, $columnsToNotUpdateOnDuplicate);
    }

    /**
     * Execute the INSERT ON DUPLICATE KEY UPDATE mysql statement
     * @param array $valueChunk
     * @param array $columnsToNotUpdateOnDuplicate
     * @return int
     */
    public function insertOrUpdate(array $valueChunk, array $columnsToNotUpdateOnDuplicate=[])
    {
        $this->setColumns(array_keys($valueChunk[0]), $columnsToNotUpdateOnDuplicate);

        $valuePlaceholders = $this->buildValuePlaceholderString($valueChunk);
        $bindings = array_flatten($valueChunk);
        $query = "INSERT INTO {$this->table} ({$this->columnString}) VALUES $valuePlaceholders ON DUPLICATE KEY UPDATE {$this->updateString}";
        return DB::connection($this->connection)->affectingStatement($query, $bindings);
    }

    /**
     * @return string
     */
    protected function buildColumnString(array $columns)
    {
        return '`' . implode('`,`', $columns) . '`';
    }

    /**
     * @param array $columns
     * @param array $columnsToNotUpdateOnDuplicate
     * @return string
     */
    protected function buildUpdateString(array $columns, array $columnsToNotUpdateOnDuplicate=[])
    {
        $updates = '';
        foreach ($columns as $column) {
            if (!in_array($column, $columnsToNotUpdateOnDuplicate))
                $updates .= "`$column`=VALUES(`$column`),";
        }
        return rtrim($updates, ',');
    }

    /**
     * @param array $valueCollection a chunk of values to build placeholders for
     * @return string
     */
    protected function buildValuePlaceholderString(array $valueCollection)
    {
        $placeholder = '';
        foreach ($valueCollection as $attributes) {
            $placeholder .= '(' . str_repeat("TRIM(?),", count($attributes));
            $placeholder = rtrim($placeholder, ',');
            $placeholder .= '),';
        }
        return rtrim($placeholder, ',');
    }
}

And you use it like this

        $bulkInserts = [];

        foreach ($posts as $post) {
            $bulkInserts[] = [
                'unique_id' => $post->unique_id,
                'unique_value' => $post->unique_value,
            ];
        }

        // bulk insert into posts
        (new Post)->insertOrUpdate($bulkInserts);

You can also pass it a 2nd parameter for columns you don't want updated if the record already exists.. example

            $columnsToNotUpdateOnDuplicate = [
                "api_call_id",
                "created_at",
            ];

            (new Post)->insertOrUpdate($bulkInserts, columnsToNotUpdateOnDuplicate);
syropian commented 5 years ago

It's an interesting thought! Some people tend to want bulk tag creation and application. i.e, they'd rather (at least initially) create less tags themselves, and just have tags auto-created and applied as the stars come in. That can get a bit tricky, and the insert counts to build up for people with a lot of stars.

That trait looks pretty cool, I'll definitely try it out once I get back on working on auto-tag stuff. '

Also, if you want to chat further (as this thread is getting a little off the initial topic) you can always join the Discord server! https://discord.gg/puxy6AT

a67793581 commented 4 months ago

Has there been any progress on this feature? I'm eager to use it as well! @syropian