avstudnitz / AvS_FastSimpleImport

Wrapper for Magento ImportExport functionality, which imports products and customers from arrays
306 stars 146 forks source link

Unable to get speed on product importation #279

Open bijalbhavsar opened 8 years ago

bijalbhavsar commented 8 years ago

We are trying to import 1 Lac products on one request and it takes around 10 hours to import products. Kindly suggest me how to achieve performance.

Do we need to improve anything in server to get performance? What steps we have to follow to achieve speed of 100000 records in 10 minutes?

Thanks in advance.

avstudnitz commented 8 years ago

We need more information for that:

bijalbhavsar commented 8 years ago

Hi Andrea,

Thank you for your response.

We have around 1200000 products to import and those are simple products. We have created 12 CSV files with 100000 bunch of products in each , also created script in magento root folder and schedule cron to run it.

At a time we are trying to import 100000 products and it is taking approx. 10- 13 hours to complete 100000 products which is too much.

I am not pretty sure about patrial indexing.

Regards, Bijal B.

On Friday, January 29, 2016, Andreas von Studnitz notifications@github.com wrote:

We need more information for that:

  • What is "Lac"?
  • How many products are you importing
  • Which type of products are you importing?
  • How do you start the import?
  • Are you using Partial Indexing?

— Reply to this email directly or view it on GitHub https://github.com/avstudnitz/AvS_FastSimpleImport/issues/279#issuecomment-176726336 .

Greeting, Bijal Bhavsar Web Developer

bijalbhavsar commented 8 years ago

Hi Avstudnitz,

Are you able to get my concern about speedup importation.

Awaiting for your reply.

Thank You, Bijal

avstudnitz commented 8 years ago

Please try to answer all my questions, your answers are important for our understanding. Regarding Partial Indexing: there is a configuration setting at System -> configuration -> Fast Simple Import named "Partial Indexing". Try to turn that off, it will speed up things. You will have to do a reindex afterwards. Generally, 1.2 million is a huge amount of products. I don't think Magento does run well with it out of the box. I hope you have a t least some indexing optimiziations in place, or are using Enterprise Edition.

bijalbhavsar commented 8 years ago

Hi Avstudnitz,

We already having "Partial indexing" set to "Off" - Is ther any other way to speed up the process?

avstudnitz commented 8 years ago

Again: please answer all my questions, you haven't yet.

paales commented 8 years ago

@bijalbhavsar Please provide concrete code examples. Possible bottlenecks are:

Other than that, please answer @avstudnitz's questions. The importer can run pretty fast, main concern in your indexing speed. If your running CE, the importer's performance isn't the bottleneck. If you're running EE, your performance should be great.

bijalbhavsar commented 8 years ago

@avstudnitz @paales Kindly find my answer to each questions.

Q: What is "Lac"? A: Lac = 100000 record. Q: How many products are you importing A: We require to import approx. 1200000 products Q: Which type of products are you importing? A: Simple products Q: How do you start the import? A: We are using cron to import, we have created file and it reads records from CSV file and create array of 1000 records in chunks and that array is used as in loop to import products using Avs_FastSimpleImport script. Q: Are you using Partial Indexing? A: Yes

Let me know if you have any further questions.

bijalbhavsar commented 8 years ago

@paales - kindly find below code snippet which executes as per the scheduled cron for every 15 minutes.

    $range=100000;   
    $logfilepath=Mage::getBaseDir('var').DS.'log'.DS.'data'.DS.'logs';
    $logfilename='feed'.DS.'logs'.DS.'feed_'.date("Y_m_d").'.log';
    if(!file_exists($logfilepath))    mkdir($logfilepath,0777);

    function getVisibilityId($visibility)
    {
        $visibilityArray = array(
            'Not Visible Individually' => 1,
            'Catalog' => 2,
            'Search' => 3,
            'Catalog, Search' => 4
            );
        return $visibilityArray[$visibility];
    }

    function getTaxClassID($taxclass)
    {
        $taxclassArray = array(
            'None' => 0,
            'Taxable Goods' => 2,
            'Shipping' => 4
            );
        return $taxclassArray[$taxclass];
    }
    $time = microtime(true);
    $path = "/feeds"; 

    $latest_ctime = 0;
    $latest_filename = '';    
    $mainRowtitle=array();
    $group_pricedata=array();
    $d = dir($path);
    while (false !== ($entry = $d->read())) {
        $filepath = "{$path}/{$entry}";
        // could do also other checks than just checking whether the entry is a file
        if(strstr($entry, 'data_full_') != false){
            if (is_file($filepath) && filectime($filepath) > $latest_ctime) {
                $latest_ctime = filectime($filepath);
                $latest_filename = $entry;
            }
        }
    } 

    $file = fopen("$path/$latest_filename", "r");
    $start_at=$datafeed['start_at'];
    $break_value = $start_at+$range;

    if(count($datafeed)>0){

     try {
         $data = array();
         $i = 0;
         $ct=0;
         $ij = 0;
         $bk=0; 
         $skus = array();
         $currentDate=date("Y/m/d");
         while(! feof($file))
         {

            $bk++;
            if($bk > $break_value){
              break;
            }
            $row = fgetcsv($file,0,';'); 
            if(($ct % 1000) == 0 ) {
             if($i==0){
              foreach ($row as $key=>$titleRow) {
               $mainRowtitle[$titleRow]=$key;
              }
             }
             $ij++; 
            }
            if($i <= $start_at) {$i++; continue;}

            if(!in_array( $row[$mainRowtitle['SKU']],$skus)){
                $skus []=$row[$mainRowtitle['SKU']]; 
                $productName=trim($row[$mainRowtitle['Product flat name EN']]);
                if(trim($productName)==''){
                  $productName = trim($row[$mainRowtitle['Vendor Name']].' - '.$row[$mainRowtitle['Description']]);
                }
                $categoryid=$row[$mainRowtitle['Category ID']];
                $weight=0.00;
                $weight=trim($row[$mainRowtitle['Weight']]);
                if($weight==''){
                  $weight=0.00;
                }

              $data[$ij][] = array(
                'store' => 'admin',
                '_product_websites' =>'base',
                '_attribute_set' => 'Default',
                '_type' => 'simple',
                '_category' => $categoryid,
                'sku' => trim($row[$mainRowtitle['SKU']]),
                'has_options' => '',
                'name' => $productName,
                'vpn' => trim($row[$mainRowtitle['VPN']]),
                'ean' => trim($row[$mainRowtitle['EAN0']]),
                'price' => trim($row[$mainRowtitle['Dealer Price']]),
                'weight' =>  $weight,
                'status' => (('Enabled' == 'Enabled')?'1':'2'),
                'visibility' => getVisibilityId('Catalog, Search'),
                'tax_class_id' =>getTaxClassID('None') ,
                'manufacturer' => trim($row[$mainRowtitle['Vendor Name']]),
                'description' => trim($row[$mainRowtitle['Description']]),
                'short_description' =>trim($row[$mainRowtitle['Description']]),
                'qty' => 1,
                'is_in_stock' => 1,
                'product_name' =>  trim($row[$mainRowtitle['Product flat name EN']]),
                'store_id' => 0,
                'product_type_id' => 'simple',
                'product_flat_name' =>  trim($row[$mainRowtitle['Product flat name EN']]), 
                'product_status' =>trim($row[$mainRowtitle['Product Status']]),

                );

                $group_pricedata[$row[$mainRowtitle['SKU']]]=array(
                    1=>trim($row[$mainRowtitle['Price A1 incl. fees & vat']]),
                    2=>trim($row[$mainRowtitle['Price D1 incl. fees & vat']]),
                    3=>trim($row[$mainRowtitle['Price D2 incl. fees & vat']])
                    );
                $ct++;
            } 
        }

        fclose($file);
    }  catch (Exception $e) {
        $message = $e->getMessage();
        $mtime=Mage::getModel('core/date')->date('Y-m-d H:i:s');
        $message=$message."\n Error Time :".$mtime; 
        Mage::log($import->getErrorMessages(), null, $logfilename);
    }

    try {                 
        $i = 0;
        $totalct=0;
        foreach($data as $chunkData) {

          // @var $import AvS_FastSimpleImport_Model_Import
          $import = Mage::getModel('fastsimpleimport/import');
          $result = $import->processProductImport($chunkData);

            foreach($chunkData as $productdata){ 
                $import->import_group_price($productdata['sku'],$group_pricedata[$productdata['sku']]);
                $totalct++;

            }
        }
        $finalmessage='Total Import product : '.$totalct.'  Elapsed time: ' . round(microtime(true) - $time, 2) . 's' . "\n"; 
        Mage::log($import->getErrorMessages(), null, $finalmessage);
    } catch (Exception $e) {

        $message = $e->getMessage();
        $mtime=Mage::getModel('core/date')->date('Y-m-d H:i:s');
        $message=$message."\n Error Time :".$mtime; 
        Mage::log($import->getErrorMessages(), null, $logfilename);
    } 
} 
}

Let me know your views on this. We have removed few new attributes which are added in $data array to reduce the size of above code.

avstudnitz commented 8 years ago

Importing only 1,000 products at a time might be the bottleneck. Try to put that number as high as possible and if the memory runs out, increase the memory limit. I'd try to aim for 100,000 products in one chunk.

bijalbhavsar commented 8 years ago

Thanks @avstudnitz - let me try that and get back to you.

bijalbhavsar commented 8 years ago

@avstudnitz we have updated from 1000 chunk to 50000 and parse 50000 product CSV file and executed script - it took 16368.98 seconds to import 49445 products.

Please suggest workaround to improve speed of importation.

paales commented 8 years ago

@bijalbhavsar Can you please run a profiler? The slowness you are experiencing isn't normal. I suggest running the import with XHProf with a small product size to debug this issue.

bijalbhavsar commented 8 years ago

It is taking around 4 hours to import 200K records without inventory, As we were using Magestore Inventory management extension for multiple warehouses and plants. (with indexing enable)

Here we found that If we enable code for Inventory importation then it is taking around 10-11 Hours to import 200K records with inventory.